Discovered an interesting issue at home today - when I ping a Nexus 9000v running in CML from an Ubuntu host, I see duplicate replies.
At first glance, you might think the Nexus is duplicating replies. Meaning, a single ICMP Echo Request packet enters the switch, and the Nexus sends two ICMP Echo Reply packets.
However, that's not the case - if you run Ethanalyzer on the mgmt0 interface of the Nexus, the Nexus sees two ICMP Echo Request packet enters with the same sequence number. Therefore, it generates two ICMP Echo Reply packets.
The Nexus is a victim of the problem, not the cause.
If we tcpdump on the Ubuntu box's network interface filtered on packets destined to the Nexus *and* packets in the outbound direction, we can see the Ubuntu box is sending one ICMP Echo Request packet per ping. So the Ubuntu isn't responsible for duplicating these packets.
When I'm troubleshooting path of the packet issues like this, I sometimes like to follow a "cocktail shaker" approach to troubleshooting. I capture packets on one side of the flow (the source), then capture packets on the other side of the source (the destination).
We know that duplicate ICMP Echo Request packets are causing this issue. We don't know where the duplicate ICMP Echo Request packets are coming from. The flow looks good at the source, but is broken at the destination. That means something in the middle is broken.
The next "hop" in this flow is the VMware ESXi standard vSwitch that the source connects to. The world ID for the virtual machine is 2101747, per this output.
The Port ID that corresponds with this virtual machine is 50331670, per this output.
We can use this Port ID with ESXI's pktcap-uw command to capture packets that ingress the vSwitch from the virtual machine. In this command, I'm also filtering on IP Protocol 1, which is ICMP.
If the Ubuntu VM's tcpdump was lying to me and the Ubuntu VM *is* duplicating ICMP Echo Request packets, I'd expect to see more than three packets ingress when I run "ping 192.168.30.115 -c 3" on the VM. however, I only see three packets, which is good! The Ubuntu VM is innocent.
Next, let's move to the CML VM where the Nexus is running. World ID is 2101027, which resolves to Port ID 50331674. Let's capture packets leaving the vSwitch towards CML to prove the vSwitch's innocence.
My test ping that reproduces this issue isn't running, and yet this packet capture is capturing other "noise" on the wire. This means we need to filter our packet capture facing the CML VM a bit further.
Let's add the --mac parameter filtered on our Ubuntu VM's MAC address to fix this.
Now, if I send three ICMP Echo Request packets from my Ubuntu VM...
...I see *nine* in the packet capture. Ubuntu says it received 7 ICMP Echo Reply packets (including 4 duplicates) - however, the ping command exits when it receives the final Reply packet, not the duplicates.
This seems to suggest that the standard vSwitch is duplicating these ICMP Echo Request packets for some reason. We can confirm this by looking at the ICMP sequence number within pktcap-uw's output, which is highlighted here.
If this were a production network, our gut instinct would be to open a support case with VMware at this point.
However, let's think about this a bit harder.
A vSwitch is *not* a normal switch. A key difference is that vSwitches do not dynamically learn MAC addresses.
This means that when a packet from the Nexus 9000v running in CML enters the vSwitch, the vSwitch doesn't learn the source MAC address of that packet such that future packets destined to the Nexus will always egress the vSwitch port connected to the CML virtual machine.
Therefore, when the vSwitch receives a packet destined to a MAC address that doesn't correspond with a virtual machine connected to the vSwitch, the vSwitch will do what every switch does - it will unknown unicast flood that packet.
A key detail I've neglected to share until now is the fact that this vSwitch (vSwitch1) has multiple physical uplinks. vmnic1-3 connect to Gi1/0/14-16 of a Cisco Catalyst switch.
The Catalyst switch is a *real* switch, in that it *does* dynamically learn MAC addresses. Specifically, the Catalyst switch has learned the MAC of the Nexus 9000v switch (5254.000d.d27c) on Gi1/0/15.
When the vSwitch unknown unicast floods, it's most likely flooding the ICMP Echo Request packet out of either vmnic1 (connecting to Gi1/0/14) or vmnic3 (connecting to Gi1/0/16). The switch then forwards this packet back to the vSwitch via Gi1/0/15 (where the MAC is learned).
Luckily, we don't have to guess - we can use pktcap-uw to confirm this theory.
The --dir parameter controls what direction of packets we'd like to capture. A value of 0 captures packets received by the uplink port from the network, 1 captures packets sent by the uplink port.
Sure enough, when we ping, we see three ICMP Echo Request packets egress vmnic3 towards Gi1/0/16 of the Catalyst switch.
If we capture packets that ingress vmnic2 from the network connecting to Gi1/0/15 of the Catalyst (where the Nexus MAC is learned), we can see the same ICMP Echo Request packets enter the vSwitch once more.
If this were a normal network, this would be an endless loop. However, I'm assuming that VMware implemented a mechanism where packets that ingress a vSwitch via an uplink port cannot egress via any other uplink port. In the Cisco world, we call this a "Deja-Vu check".
There are two possible fixes I can see for this issue:
1. Remove all redundant uplink ports from the vSwitch, such that the vSwitch only has a single uplink port. 2. Migrate CML to its own dedicated vSwitch with its own dedicated uplink port.
I opted for the 2nd option so that the rest of my VMs still have some sort of redundancy.
Sure enough, the issue is resolved!
Now that we've figured out the root cause of the issue and implemented a fix, we have one last question to answer - "How was this working before? Why did this break?"
A few weeks ago, our air conditioning broke in the middle of a heat wave. In response, I powered off my lab for a few days while the air conditioning was being fixed.
My lab only has a single ESXi host, which boots from an internal flash drive.
After I powered the host back on, the flash drive had failed. I needed to replace it, reinstall ESXi, and import all of my virtual machines.
Prior to the outage, I used vCenter Server Appliance. After the outage, I decided not to use it anymore because I don't need it.
When I used VCSA, the ESXi host connected to the Catalyst switch through an LACP port-channel (which you need distributed vSwitches to do, which is unlocked through VCSA). After VCSA was removed, I opted to remove the port-channel from both sides in favor of independent uplinks.
With a port-channel, the Catalyst switch would perform the Deja-Vu check when it receives a flooded ICMP Echo Request packet and drop the packet in hardware. This behavior would prevent this issue from occurring. Once the port-channel was removed, this issue was introduced.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Lots of things can ruin the average person's Christmas holiday. In 2019, one network engineering team ruined their Christmas by combining IP SLA operations, track objects, and static routes.
It was Christmas Day of 2019, and I was working the holiday shift in Cisco TAC. Working Christmas is enjoyable - it tends to be quiet, and in the rare case you need to assist with an issue or outage, customers are nice and in good spirits.
On this day, a case came in requesting a Root Cause Analysis (RCA) for an outage that happened a few hours ago. The outage lasted about 23 minutes in length, and the environment recovered on its own.
A #network administrator's worst nightmare can be intermittent network congestion - it's impossible to predict, short-lived, and has major impact. Can #Python help us find and fix it?
A case I've seen in TAC is where customers observe intermittently incrementing input discard counters on interfaces of a Nexus 5500 switch. This is usually followed by reports of connectivity issues, packet loss, or application latency for traffic flows traversing the switch.
Oftentimes, this issue is highly intermittent and unpredictable. Once every few hours (or sometimes even days), the input discards will increment over the course of a few dozen seconds, then stop.
Excellent thread from Nick on this topic! A big point I'm a fan of:
"...most juniors don't have an immediately accessible lab on their laptops or cloud environment, because they don't spend much time labbing. Most mid-levels can spin up a topology on demand."
Labbing something does not have to be a arduous, time-intensive process. Being familiar with lab resources available to you and knowing how to efficiently use them is paramount to getting definitive answers to questions quickly.
For example, let's say somebody asks me whether changing the MTU on Layer 3 interfaces between two routers causes an OSPF adjacency between both routers to immediately flap.
A common misunderstanding engineers have about Equal-Cost Multi-Pathing (ECMP) and port-channels is that they increase the bandwidth that can be used between two network devices. This *can* be true, but isn't *always* true.
Curious why? 🧵
First, let's review our topology. Three Cisco Nexus switches are connected in series. Traffic generators are connected to Switch-1 and Switch-2 through physical interface Ethernet1/36. Switch-1 and Switch-2 connect to Router through Layer 2 port-channels.
As the names suggest, Switch-1 and Switch-2 are purely Layer 2 switches. Router is a router that routes between two networks - 192.168.1.0/24, and 192.168.2.0/24. The traffic generator mimics four hosts - two in 192.168.1.0/24, two in 192.168.2.0/24.
"I see a lot of packet loss when I ping my switch" 🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩🚩
Wait, why is this a red flag? Let's dig into this behavior in a bit more detail... 🧵
First, let's take a look at our topology. We have two hosts in different subnets that connect to a Cisco Nexus 9000. One host connects via Ethernet1/1, and the other connects via Ethernet1/2. Ethernet1/1 has an IP of 192.168.10.1, while Ethernet1/2 owns 192.168.20.1.
The architecture of most network devices has three "planes" - a data plane, a control plane, and a management plane. We'll focus on the first two. The data plane handles traffic going *through* the device, while the control plane handles traffic going *to* the device.
On Cisco Nexus switches in production environments, avoid working within a configuration context on the CLI unless you're actively configuring the switch. Otherwise, you might accidentally cause an outage by trying to run a show command.
Curious how that's possible? 🧵
Cisco IOS and IOS-XE require you to prepend show commands with the "do" keyword to execute them within a configuration context.
NX-OS does not require you to do this - you can run a show command within any configuration context without the "do" keyword.