I started another round of the ESXi patching recently (because of this), when I experienced an unpleasant issue. Some of my clusters failed to remediate using update manager.
Hosts failed to enter maintenance mode with the following message:
Current progress of remediation: 0 hosts completed successfully, 0 hosts completed with errors, 0 hosts are being remediated, 0 hosts are waiting to start remediation, and 2 hosts could not enter maintenance mode and are waiting to retry.
Orchestrated patching using update manager has pretty easy checklist and I was OK everywhere, also my Remediation Pre-check was showing no issues:
- automated DRS, with no VM overrides
- no host CD-ROM attached to the VMs
- admission control disabled in case of low resources(this can by disabled by update manger automatically)
- Enough RAM and CPU to put your host into maintenance mode of course 🙂
I googled a bit and found out posts, describing the same issue, but no solution. Users ended up putting hosts into maintenance mode manually and patched them one by one.
This think happens, if you have 2 node cluster only, regardless of the admission control settings.
Therefore I opened support case and looks like solution is pretty easy:
Just disable vSphere HA before remediation 🙂
I’m pretty sure this was fine in vSphere 5.5 even 6.0. As far as I remember, I had experienced this behavior since 6.5.
Don’t forget to turn HA back on afterwards!
By the way as I was writing this, I’ve found out following note in vSphere 6.7 docs:
When you perform remediation on a cluster that consists of not more than two hosts, disabling HA admission control might not be enough to ensure successful remediation. You might need to disable vSphere Availability (HA) on the cluster. If you keep HA enabled, remediation attempts on host in the cluster fail, because HA cannot provide recommendation to Update Manager to place any of the hosts into maintenance mode. The reason is that if one of the two hosts is placed into maintenance mode there is no failover host left available in the cluster. To ensure successful remediation on a 2-node cluster, disable HA on the cluster or place the hosts in maintenance mode manually and then perform remediate the two host in the cluster.
Latest posts by Dusan Tekeljak (see all)
- VM Latency Sensitivity set to High still fails with no (proper) warning - June 27, 2024
- ESXi 6.7 U1 fixes: APD and VMCP is not triggered even when no paths can service I/Os - November 30, 2018
- Update manager error: hosts could not enter maintenance mode - November 19, 2018