Model-Driven Streaming Telemetry with TIG Stack (IOS-XE)

In the SDN/NetDevOps era, it would be unfair to leave network monitoring behind. Monitoring and general network ‘observability’ is going through just as much of a transformation as the configuration management of the devices themselves.

SNMP, although highly structured by design can be a bit of a management nightmare for the uninitiated. Trawling through endless OID structures, dealing with unsupported MIBs, and battling with protocol inefficiencies are just a few of the drawbacks one faces when using SNMP to target specific data.

With the advent of YANG data modeling, we can use essentially the same data modeling schema that we use for declaratively defining our configuration intent as well as specifying the data we want to retrieve for network observability purposes – Goodbye to the dark art of OID trawling! One data modeling language to rule them all.

So YANG can help us easily identify the data we want, how about visualising it? This is where TIG stack comes in. TIG is made up of three open-source components Without getting too heavy, their general purpose is outlined below.

  • Telegraf – The Data Collection Piece
  • InfluxDB – The Data Storage Piece
  • Grafana – The Data Visualisation Piece

Getting a TIG stack instance up and running for testing/labbing is pretty straightforward, we can use docker-compose to quickly stand up our stack (docker-compose file in my github).

Lets ensure the Telegraf configuration is set to receive our streaming telemetry data before standing up our environment.

in the telegraf.conf file we are leveraging the inbuilt cisco_telemetery_mdt plugin. Let’s use gRPC as the transport because its edgy and cool why not? Let’s set telegraf to listen on port 57000 for incoming gRPC messages from our hosts.

After running our docker-compose up, we can see that all 3 elements of the TIG stack are up and running and exposing the ports specified.

Great, our monitoring stack is up! let’s get some data into it.

I am using containerlab with the vr-net plugin to run a couple of CSR-1000v’s as ‘container-wrapped’ VM’s but it doesn’t really matter what you use, as long as the routers have reachability to the telegraf container on port 57000.

As we are using YANG, we need to identify the actual YANG models that the platform supports, we can use something like pygnmi or gnmic for this, however I decided to bush off an old custom python script that I had kicking about (just to see if it still worked…). These capabilities are documented but lets actively look anyway.

Running our capabilities script we can see that the YANG model ‘Cisco-IOS-XE-bgp-oper’ is supported, along with countless others. Let’s focus on this model for now in order to gain insight into our BGP operations.

As we are working with YANG, I highly recommend using a tool such as pyang to help visualise the model – git clone the following repo to get all YANG models locally (https://github.com/YangModels/yang)

Let’s browse to the dir where our model resides and run pyang to visualise the tree.

Looking at the above structure, we can deduce the following xpath expression to see the amount of installed BGP prefixes in our BGP table

/bgp-ios-xe-oper:bgp-state-data/neighbors/neighbor/installed-prefixes

From the above, it may be obvious where the expression following the colon (:) comes from, but what about before the colon? Well, we are able to find this at the top of the module itself. We can dig into the actual model itself and look for the top-level prefix as below:

Now we have the full xPath and know how to construct it, we can configure our Router to send telemetry via gNMI

The configuration on IOS-XE is relatively self-explanatory – We are setting the data encoding to key-value google proto-buf (kvgpb) for use with grpc. Specifying the xpath expression as deduced from our YANG model. Setting YANG-Push as this is ‘streaming’ telemetry leveraging YANG. We are also setting the update frequency to ‘periodic’ based on 5 100ths of a second – we could also change this to ‘on-change’ to only send telemetry data when there is a change in the value itself.

telemetry ietf subscription 20
 encoding encode-kvgpb
 filter xpath /bgp-ios-xe-oper:bgp-state-data/neighbors/neighbor/installed-prefixes
 source-address 172.20.20.15
 stream yang-push
 update-policy periodic 500
 receiver ip address 172.20.20.8 57000 protocol grpc-tcp

We can validate the configuration for each xPath expression by looking at ‘show telemetry ietf subscription <id> detail’ – Looks like our config is valid, lets build out a Grafana Dashboard!

In Grafana we can create a new dashboard and add a panel to visualise the data we are receiving from telegraf and storing in influxDB. The query editor makes it simple, with a few clicks you can cobble together some SQL-esq statement that will return you what you require.

We have 5 prefixes installed in the BGP table so far!

I’ll advertise some more from our peer and we should see them reflected in the pannel/we have just built.

We can see that the amount of prefixes installed in the BGP table has increased to 8 from 5!

Let’s add another configuration to see the uptime of the peer along with another panel in our dashboard.

telemetry ietf subscription 30
 encoding encode-kvgpb
 filter xpath /bgp-ios-xe-oper:bgp-state-data/neighbors/neighbor/up-time
 source-address 172.20.20.15
 stream yang-push
 update-policy periodic 500
 receiver ip address 172.20.20.8 57000 protocol grpc-tcp

We are leveraging the same YANG model, just a different xPath this time.

Our dashboard configuration looks a little different this time, this is because the value returned from is a string and not an int – the key values are specified in the tree returned from pyang, the strings are displayed in a Table format this time, I have chosen to only show the last row in the table.

We can see our peering has been up for 1 week and 4 days

We can display the pannels on the same dashboard as below, allowing for us to easily create some useful NOC views from any YANG model! Pretty Sweet?

So that’s been visualising streaming telemetry data in TIG stack with gRPC, from IOS-XE!

Thanks for reading! As Always, code/files on my github!
https://github.com/thecraigus/TIGBlog/blob/main/telegraf.conf

Leave a comment