A #network administrator's worst nightmare can be intermittent network congestion - it's impossible to predict, short-lived, and has major impact. Can #Python help us find and fix it?

Let's find out! 🧵

Prefer a blog post format? Click here: chrisjhart.com/Practical-Pyth…
A case I've seen in TAC is where customers observe intermittently incrementing input discard counters on interfaces of a Nexus 5500 switch. This is usually followed by reports of connectivity issues, packet loss, or application latency for traffic flows traversing the switch.
Oftentimes, this issue is highly intermittent and unpredictable. Once every few hours (or sometimes even days), the input discards will increment over the course of a few dozen seconds, then stop.
First, let's explore what input discards mean, the challenges with troubleshooting this intermittent issue manually, and how we can use Python to help troubleshoot.
Cisco Nexus 5000, 6000, and 7000 switches utilize a Virtual Output Queue (VOQ) queuing architecture for unicast traffic.
This means as a packet enters the switch, it:

1. Parses packet headers
2. Determines the packet's egress interface (forwarding decision)
3. Buffers the packet on the ingress interface in a virtual queue (VQ) for the egress interface until the interface can transmit the packet
An interface becomes congested when the total sum of traffic that needs to be transmitted out of the interface exceeds the bandwidth of the interface itself.
For example, let's say that Ethernet1/1, Ethernet1/2, and Ethernet1/3 of a switch are all interfaces with 10Gbps of bandwidth.
Let's also say that 7.5Gbps of traffic ingresses the switch through Ethernet1/1, and 7.5Gbps of traffic ingresses the switch through Ethernet1/2.
All 15Gbps of this traffic needs to egress Ethernet1/3. However, Ethernet1/3 is a 10Gbps interface; it can only transmit up to 10Gbps of traffic at once. Therefore, the excess 5Gbps of traffic *must* be buffered by the switch.
This traffic will be buffered at the ingress interfaces (Ethernet1/1 and Ethernet1/2) within a virtual queue that represents Ethernet1/3.
If Ethernet1/3 remains congested for a "long" period of time, then the virtual queue for Ethernet1/3 on the ingress buffer of Ethernet1/1 and Ethernet1/2 will become full, and the switch will not be able to accept new packets destined to Ethernet1/3.
Additional packets are then dropped on ingress, and the "input discards" counter is incremented on the ingress interface (Ethernet1/1 and Ethernet1/2). It's important to note that no counter will be incremented on the congested egress interface.
The word "long" is purposefully ambiguous because it is subjective. Within a matter of milliseconds, an egress interface can become congested, ingress buffers can fill, and input discards can begin to increment on a switch's interfaces.
A few milliseconds is a very short amount of time to a human being, but to a computer, a few milliseconds can be a lifetime.
The two key points are:

1. Input discards in a VOQ queuing architecture indicate one or more congested egress interfaces
2. There is no counter on an interface that identifies it is congested
Next, let's investigate how you would troubleshoot a network congestion issue on Nexus 5500 switches manually.
Ultimately, network congestion is a system throughput problem. There are two fundamental ways to solve a throughput problem in a system:

1. Identify the bottleneck of the system and improve it.
2. Reduce the flow of the system until the bottleneck no longer exists.
Option #2 is not very helpful because network engineers often have little or no control over the amount of traffic within the network, which leaves us with Option #1.
Dr. Goldratt's Theory of Constraints states that making improvements anywhere but the bottleneck of a system will not improve the throughput of the system. Therefore, identifying a congested egress interface is extremely important to solving network congestion.
Nexus 5500 series switches offer a command - "show hardware internal carmel asic <x> registers match .*STA.*frh.*" - that displays a set of ASIC registers that match the regular expression ".*STA.*frh.*".
Registers matching this pattern identify the amount of data stored in a specific egress interface's virtual queue at the time of the command's execution.
The output of this command shows that the interface corresponding with ASIC 0's memory address 5 (indicated by the "addr_5" substring of the register) contained data when the command was executed.
This output does *not* show historical data - it's a snapshot of the ASIC's buffers at the time the command is executed.
We can translate the ASIC number and memory address to an egress interface we recognize using the "show hardware internal carmel all-ports" command.
This table has three columns we should pay attention to - "name", "car", and "mac". The "car" column (which stands for "Carmel", the internal name of the Nexus 5500 ASIC) maps to the ASIC identifier. The "mac" column maps to the memory address.
The "name" column contains the internal name of the switch's interface, the numbering of which (e.g. 1/1, 1/2, etc.) maps to the external-facing interface we're familiar with (e.g. Ethernet1/1, Ethernet1/2, etc.).
The ASIC registers displayed by the "show hardware internal carmel asic <x> registers match .*STA.*frh.*" command indicates that memory address 5 of ASIC 0 contained data when the command was executed.
The table displayed by the "show hardware internal carmel all-ports" command translates this to internal interface xgb1/5, the numbering of which means that virtual queue for interface Ethernet1/5 contained data when the command was executed.
When network congestion is constantly occurring (meaning, input discards are constantly incrementing on one or more interfaces), you can execute these commands rapidly and get a very good idea as to which interface is your congested interface.
However, these commands are not very useful when network congestion is intermittent and strikes over a small period of time.
Most organizations cannot expect an employee to monitor interfaces for incrementing input discards over several hours *and* parse the collected data without error.

Thankfully, this is a perfectly reasonable expectation for a Python script!
I created and published a Python script that solves this problem: github.com/ChristopherJHa…
It connects to a Nexus 5500 switch over SSH and rapidly (~10-15 checks per second) monitors a single interface for incrementing input discards.
The script identifies when input discards start to increment and collects ASIC register information.
If the script is interrupted with Control+C, ASIC register data is parsed and summarized. Registers are conveniently translated to their corresponding external-facing names that network administrators are familiar with.
Overall, this script took about 4-6 hours to develop, test, and refine. In return, I hope it will save engineers countless hours tracking down congested egress interfaces and improve the reliability of the world's networks!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Christopher Hart

Christopher Hart Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @_ChrisJHart

Jan 23
Excellent thread from Nick on this topic! A big point I'm a fan of:

"...most juniors don't have an immediately accessible lab on their laptops or cloud environment, because they don't spend much time labbing. Most mid-levels can spin up a topology on demand."
Labbing something does not have to be a arduous, time-intensive process. Being familiar with lab resources available to you and knowing how to efficiently use them is paramount to getting definitive answers to questions quickly.
For example, let's say somebody asks me whether changing the MTU on Layer 3 interfaces between two routers causes an OSPF adjacency between both routers to immediately flap.
Read 7 tweets
Oct 24, 2021
A common misunderstanding engineers have about Equal-Cost Multi-Pathing (ECMP) and port-channels is that they increase the bandwidth that can be used between two network devices. This *can* be true, but isn't *always* true.

Curious why? 🧵
First, let's review our topology. Three Cisco Nexus switches are connected in series. Traffic generators are connected to Switch-1 and Switch-2 through physical interface Ethernet1/36. Switch-1 and Switch-2 connect to Router through Layer 2 port-channels.
As the names suggest, Switch-1 and Switch-2 are purely Layer 2 switches. Router is a router that routes between two networks - 192.168.1.0/24, and 192.168.2.0/24. The traffic generator mimics four hosts - two in 192.168.1.0/24, two in 192.168.2.0/24.
Read 35 tweets
Oct 17, 2021
"I see a lot of packet loss when I ping my switch" 🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩

Wait, why is this a red flag? Let's dig into this behavior in a bit more detail... 🧵
First, let's take a look at our topology. We have two hosts in different subnets that connect to a Cisco Nexus 9000. One host connects via Ethernet1/1, and the other connects via Ethernet1/2. Ethernet1/1 has an IP of 192.168.10.1, while Ethernet1/2 owns 192.168.20.1.
The architecture of most network devices has three "planes" - a data plane, a control plane, and a management plane. We'll focus on the first two. The data plane handles traffic going *through* the device, while the control plane handles traffic going *to* the device.
Read 19 tweets
Oct 15, 2021
On Cisco Nexus switches in production environments, avoid working within a configuration context on the CLI unless you're actively configuring the switch. Otherwise, you might accidentally cause an outage by trying to run a show command.

Curious how that's possible? 🧵
Cisco IOS and IOS-XE require you to prepend show commands with the "do" keyword to execute them within a configuration context.
NX-OS does not require you to do this - you can run a show command within any configuration context without the "do" keyword.
Read 12 tweets
Sep 4, 2021
Discovered an interesting issue at home today - when I ping a Nexus 9000v running in CML from an Ubuntu host, I see duplicate replies.
At first glance, you might think the Nexus is duplicating replies. Meaning, a single ICMP Echo Request packet enters the switch, and the Nexus sends two ICMP Echo Reply packets.
However, that's not the case - if you run Ethanalyzer on the mgmt0 interface of the Nexus, the Nexus sees two ICMP Echo Request packet enters with the same sequence number. Therefore, it generates two ICMP Echo Reply packets.

The Nexus is a victim of the problem, not the cause.
Read 32 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(