Unit Testing Network Infrastructure w/ pyATS

There’s nothing better than a greenfield deployment. The infrastructure has been deployed to the ‘Gold Standard’ in terms of design practices and all required optimisations have been put in place from a network engineering perspective to ensure the best user and application experience.

In reality, these infrastructures don’t live in an isolated environment, moves adds, and changes often end up causing drift from the initial design. Verbose documentation that outlines the configuration of the environment quickly becomes obsolete as the network adapts to its new requirements.

From an operational perspective, the optimal operating state of the infrastructure becomes lost and abstracted with the over-reliance on high-level monitoring tools – NOC teams often resort to ‘blob-watching’ (dots go green to red..) as opposed to ensuring the network is operating at its ‘Gold Standard’ – What even is the intended operating state of the infrastructure, how can we quickly determine drift? Cross-referencing that initial design document to CLI show commands? – Forget about it.

pyATS

This is where pyATS comes in, pyATS is a python based testing framework geared towards network infrastructure – (There are a number of abstractions and additional modules to the pyATS eco-system such as genie parsers and a CLI wrapper, this writeup will focus on the core python library). Developed and maintained by Cisco internal engineering, it’s now (and has been for a while) available for the outside world to use. And yes it supports multiple vendors! Just check out the unicon docs (https://pubhub.devnetcloud.com/media/unicon/docs/user_guide/supported_platforms.html) unicon is the underlying connection library that pyATS is using.

Take the below topology:

I have a number of unit tests that validate the optimal operating state of the infrastructure.

  • Both R1 and R2 should have 1 OSPF peering. – No less, no more.
  • The next hop for R2’s Loopback From R1 should be via 10.250.10.2
  • The next hop for R1’s Loopback from R2 shoult be via 10.250.10.2
  • NXOS-A should have 2 eBGP peers in an ‘Established’ state.

Lets code it up!

pyATS requires the infrastructure that you are testing against be defined in a YAML testbed file. In my case, this is in my_testbed.yaml

testbed:
    name: my_topology
    credentials:
        default:
            username: craig
            password: Redacted!
        enable:
            password: Redacted!

devices:
    R1: 
        os: ios
        type: ios
        connections:
            vty:
                protocol: ssh
                ip: 192.168.137.35
    R2:
        os: ios
        type: ios
        connections:
            vty:
                protocol: ssh
                ip: 192.168.137.36
    NXOS-A:
        os: nxos
        type: switch
        connections:
            rest:
                class: rest.connector.Rest
                ip: 192.168.137.37
                credentials:
                    rest:
                        username: craig
                        password: Redacted!

For R1 and R2 we are using the SSH connector within unicon, however for NXOS-A I decided to leverage the REST API for something a bit different.

In order to connect to device APIs with pyATS/unicon, you will have to install the rest connector package (https://developer.cisco.com/docs/rest-connector/)

The below code is our actual ‘test script’ in this script we can code up our test requirements that were articulated above that highlight our infrastructure’s operational gold standard.

tests are decorated with the aetest.test decorator, we can write individual tests or loop over infrastructure components that subscribe to a common test-case with the aetest.loop decorator

from pyats import aetest
import re

class CommonSetup(aetest.CommonSetup):

    @aetest.subsection
    def check_topology(self,
                       testbed):
        ios1 = testbed.devices['R1']
        ios2 = testbed.devices['R2']
        nxos_a = testbed.devices['NXOS-A']

        self.parent.parameters.update(ios1 = ios1, ios2 = ios2,nxos_a = nxos_a)


    @aetest.subsection
    def establish_connections(self, steps, ios1, ios2, nxos_a):
        with steps.start('Connecting to %s' % ios1.name):
            ios1.connect()

        with steps.start('Connecting to %s' % ios2.name):
            ios2.connect()

        with steps.start('Connecting to %s' % nxos_a.name):
            nxos_a.connect(via='rest')

@aetest.loop(device = ('ios1', 'ios2'))
class CommonOspfValidation(aetest.Testcase):
    @aetest.test
    def NeighborCount(self,device):
        try:
            result = self.parameters[device].execute('show ip ospf neighbor summary')
        except:
            pass

        else:
            neighborcount = re.search(r"FULL.+(\d)",result).group(1)
            assert  int(neighborcount) == 1

class R1_ValidateEgressTransit(aetest.Testcase):
    @aetest.test
    def R2NextHop(self,ios1):
        try:
            result = ios1.execute('show ip route ospf')
        except:
            pass

        else:
            nextHop = re.search("2.2.2.2.+via (\d+.\d+.\d+.\d+)",result).group(1)
            assert  nextHop == '10.250.10.2'

class R2_ValidateEgressTransit(aetest.Testcase):
    @aetest.test
    def R1NextHop(self,ios2):
        try:
            result = ios2.execute('show ip route ospf')
        except:
            pass

        else:
            nextHop = re.search("1.1.1.1.+via (\d+.\d+.\d+.\d+)",result).group(1)
            assert  nextHop == '10.250.10.1'

class NXOS_A_Unit_Tests(aetest.Testcase):
    @aetest.test
    def peer_r1(self,nxos_a):
        try:
            result = nxos_a.rest.get('/api/mo/sys/bgp/inst/dom-default/peer-[10.250.100.2]/ent-[10.250.100.2].json')
        except:
            pass

        else:
            operState = (result['imdata'][0]['bgpPeerEntry']['attributes']['operSt'])
            if operState != 'established':
                self.failed('peer not established')

    @aetest.test
    def peer_r2(self,nxos_a):
        try:
            result = nxos_a.rest.get('/api/mo/sys/bgp/inst/dom-default/peer-[10.250.100.6]/ent-[10.250.100.6].json')
        except:
            pass

        else:
            operState = (result['imdata'][0]['bgpPeerEntry']['attributes']['operSt'])
            if operState != 'established':
                self.failed('peer not established')
class CommonCleanup(aetest.CommonCleanup):

    @aetest.subsection
    def disconnect(self, steps, ios1, ios2):
        with steps.start('Disconnecting from %s' % ios1.name):
            ios1.disconnect()

        with steps.start('Disconnecting from %s' % ios2.name):
            ios2.disconnect()

if __name__ == '__main__':
    import argparse
    from pyats.topology import loader

    parser = argparse.ArgumentParser()
    parser.add_argument('--testbed', dest = 'testbed',
                        type = loader.load)

    args, unknown = parser.parse_known_args()

The script structure follows a Setup/Test/Cleanup structure. With the unit tests defined in the Test section. This is where the test logic is defined. Lets examine a single unit test from our script. The below is the unit test for NXOS-A, peering to r1.

class NXOS_A_Unit_Tests(aetest.Testcase):
    @aetest.test
    def peer_r1(self,nxos_a):
        try:
            result = nxos_a.rest.get('/api/mo/sys/bgp/inst/dom-default/peer-[10.250.100.2]/ent-[10.250.100.2].json')
        except:
            pass

        else:
            operState = (result['imdata'][0]['bgpPeerEntry']['attributes']['operSt'])
            if operState != 'established':
                self.failed('peer not established')

  1. We are connecting the the API of NXOS-A and performing a get-request to get the status of the peer 10.250.100.2
  2. We are examining the result of the API request, if the state is not establised, we fail the unit test.

We run our test-script and specify the testbed YAML file at runtime with the –testbed flag.

The results of our unit tests are below with the status of each one in a nice tree format. It looks like all our unit tests have passed!

Let’s break something and run these tests again, lets make a configuration change to break the peering from NXOS-A to R1 and run these tests again.

We can see that the unit tests have now failed, specifically on the NXOS_A unit tests for peer R1. If we look at the log output we can see our failure reason.

Summary

Awesome! Again this was a simplistic example but as if with anything in the code-area, the only limit is your imagination.

All code available on my github

Leave a comment