Two weeks ago, we started with regular updates of our Flex system servers, including drivers and firmwares. It is good practice to update drivers first so I followed this procedure as usually.
Our update list included ESXi driver for CN4022 (QLogic Broadcom BCM57840). Since IBM/Lenovo doesn’t provide VMware drivers for those cards at their site, I downloaded the latest driver available at VMware site for ESXi 6.0 (bnx2x version 2.713.30.v60.8).
Update was successful and everything looked fine after the reboot. However, only till I rebooted server for the second time.
I was getting following errors in the IMM and system was halted during uEFI initialization (I had to reset uEFI settings to boot it into the OS):
Sensor Mezz Exp 1 Fault has transitioned to critical from a less severe state.
I’m guessing some registry changed after ESXi loaded the driver for the first time. And card firmware wasn’t able to handle it.
ESXi couldn’t load bnx2x module anymore, although you could see the cards in the PCI device list.
I also tried to update firmware to the new version using bootable media, but with no luck.
I managed to brick 3 servers and 6 cards till I discovered what was causing it. Since everything looked good after the first reboot.
The only solution to fix this was to contact vendor to replace the cards!
Don’t forged to downgrade drivers back ESXi first, because I’m pretty sure it will brick it again!
All of this happened with the following combination
- Driver bnx2x version 2.713.30.v60.8
- Firmware 7.12e.4.2d
- Bootcode for 578xx (MFW) 7.13.24
- UEFI Driver NX2_Ev 7.13.4)
Please note this driver (bnx2x version 2.713.30.v60.8) works fine after I updated firmware to the new version 7.13b.4.1c (Bootcode for 578xx (MFW) 7.13.75 UEFI Driver NX2_Ev 7.15.0) which I was planning to do right after the drivers update anyway.
Latest posts by Dusan Tekeljak (see all)
- VM Latency Sensitivity set to High still fails with no (proper) warning - June 27, 2024
- ESXi 6.7 U1 fixes: APD and VMCP is not triggered even when no paths can service I/Os - November 30, 2018
- Update manager error: hosts could not enter maintenance mode - November 19, 2018
Hi Dusan,
did the vendor replace the bricked cards free of charge? (I assume servers where under support contract)?
Pavol
Hi, yes they did 😉
So in this instance are you saying you should update the firmware first before the drivers?
Hi, I ‘m not sure what will happen If you are running different firmware, but I would strongly recommend to update fw first If you are running mentioned version.
I’d have to see if I can check the version #s but this sounds an awful lot like what I’ve had happen on a couple of HP ProLiant BL460c Gen8 with a BCM57810S. Interestingly, it’s only happened on machines we’ve used Update Manager to upgrade from 5.5 -> 6.5 -> 6.5U1, and not on ones we wiped and replaced with 6.5, then Update manager to U1.