Those of you who follow social media or simply have ESXi 5.5 Update 3 and use snapshots, are already aware of the problem described in VMware KB 2133118 .
After Snapshot consolidation task, virtual machines running on VMware ESXi 5.5 Update 3 hosts will crash. These are the type of errors you will see in the virtual machine’s vmware.log file:
[YYYY-MM-DD] <TIME>Z| vcpu-0| I120: SNAPSHOT: SnapshotDiskTreeFind: Detected node change from ‘scsi0:1’ to ‘snapshot0.disk1.node’.
[YYYY-MM-DD] <TIME>Z[+0.000]| vcpu-0| W110: Caught signal 11 — tid 1272281 (addr 98)
[YYYY-MM-DD] <TIME>Z[+0.000]| vcpu-0| I120: Unexpected signal: 11.
[YYYY-MM-DD] <TIME>Z[+9.079]| vcpu-0| I120: Msg_Post: Error
[YYYY-MM-DD] <TIME>Z[+9.079]| vcpu-0| I120: [msg.log.error.unrecoverable] VMware ESX unrecoverable error: (vcpu-0)
[YYYY-MM-DD] <TIME>Z[+9.079]| vcpu-0| I120+ Unexpected signal: 11.
The short timeline of the issue speaks for it’s high probability and overall urgency:
- 15-Sep-2015: VMware ESXi 5.5 Update 3 released
- 20-Sep-2015: VMs crashing after snapshot deletion reported on the VMware Forums
- 29-Sep-2015: Issue noted on Veeam Forum as well
- 1-Oct-2015: VMware publishes KB 2133118 describing the issue and suggesting downgrade as the only workaround
- 6-Oct-2015: VMware updates KB 2133118 and releases VMware ESXi 5.5 Update 3a where the issue is resolved
No wonder this bug affected a lot of environments – snapshots are a vital and commonly used feature in vSphere. Imagine having a backup solution, such as Veeam Backup – not only would you have your VMs crashing, but you wouldn’t have backups for weeks unless you downgrade or decide to wait for the fix.
Nikolay Nikolov posted a nice review of Runecast Analyzer recently. Since one of its core features is VMware KB scan, I tested Runecast on an infrastructure with ESXi 5.5 Update 3 and it detected the potential issue immediately.
This particular VMware KB became viral within the confines of our virtualization community, mainly due to its broad scope, probability and impact. But VMware is publishing and updating KBs every week and it is difficult for the admin to track especially those that relate to more specific configurations.
This is the whole reason we decided to develop Runecast Analyzer – after suffering numerous outages and ending up reading a KB every time, after spending hours and days in troubleshooting.
I would be happy to hear your feedback.
Latest posts by Stanimir Markov (see all)
- VMs crashing after snapshot consolidation in ESXi 5.5 Update 3 - October 13, 2015
- Farewell, IBM. Hello, Runecast! - February 1, 2015
- The Virtualist goes to VMworld! - November 25, 2014