I was experiencing strange behavior with Virtual Machines running on the ESXi hypervisor caused by Cisco Nexus CSCui86494 bug. Client was reporting very slow or even unresponsive application with long waiting time. When I tried the basic ping there was a packet loss somewhere between 70 – 80%. What made the investigation harder was that there were affected around 10 Virtual Machines out of several hundreds, running on the different ESXi hosts. Network setting on the ESXi servers were checked without any root cause found. What was also interesting is that vMotion always fixed an issue and VMs became again available. After some time client also reported application slowness on a physical server which totally confirmed the presumption that it is a network issue.
During investigation was found out that one of the pair affected system is always sitting in one particular subnet. From this point it was quick easy to find out that issue is caused by HRSP Cisco bug affecting just one particular VLAN and just very small number of servers sitting in this subnet.
What is HRSP? Hot Standby Router Protocol (HSRP) is a Cisco redundancy protocol which established a fault-tolerant default gateway.
The goal of the HSRP is to allow hosts to appear to use a single router and to maintain connectivity even if the actual first hop router they are using fails. Multiple routers participate in this protocol and in concert create the illusion of a single virtual router. The protocol insures that one and only one of the routers is forwarding packets on behalf of the virtual router. End hosts forward their packets to the virtual router.
It also explains why vMotion always fixed the issue. vMotion initializes sending Reverse Address Resolution Protocol (RARP) notifies the switch about virtual machine’s MAC address. When HSRP vmac address was pointing to an incorrect interface vMotion re-register MAC address and fixed the issue.
Cisco reports the bug under Resolved Caveats CSCui86494 using following symptom:
The switch is pointing a remote dynamically learned HSRP vMAC address to an incorrect interface and also shows the entry as static in the MAC-address table. After you add a static MAC entry to ensure the switch points the MAC address to the right interface, the switch points the MAC address to the correct interface. If the static MAC address is removed, the entry still shows up in the switch as static.
For permanent fix Cisco recommend to install 6.2 release of Cisco Nexus 7000 Series NX-OS, which should fix the CSCui86494 bug. For more information refer to official Release Notes.
As a temporary solution and quick fix, if you don’t want to upgrade the NX-OS, is to disable gateway function on the Cisco Nexus 7000 Series, which was also our case. Upgrade of NX-OS is planned to be done in a next couple of weeks.
Latest posts by Jan Hosek (see all)
- Update of Cloud Storage Gateways category on WhatMatrix - August 23, 2016
- Cloud Storage Gateways Comparison on WhatMatrix - October 1, 2015
- VMware vCenter Server top level inventory – MoRef ID - April 2, 2015