Intro
To clarify, vCenter HA over Layer 3 means that vCenter Active, Passive and Witness nodes will be running on different subnets. And I am not talking about the HA network, I am taking about the Management network. This of course means that you will need to rewrite your DNS record after the failover but more about that later on in this article.
The topic of vCenter HA is more or less covered on the web. VMware has a KB about this configuration. My friend David (@david_pasek) also wrote an article about this. Problem is that none of those articles worked as an end to end deployment manual for me. So let me just cover which steps I took to make the configuration work and it might be helpful for someone else.
Scope
So, as I have covered in my yesterday’s article, we have a Stretched Cross-Site environment, with dedicated Management and Edge clusters in each site and a Stretched vSAN cluster for Compute (customer workloads). There is also a third site which hosts witness components. This all is managed by a single vCenter.
Although I was running vSphere 6.5, to utilise the full potential of vCenter HA, I am using embedded PSC (which is recommended as part of vSphere 6.5 U2). Here is how my HA config looks like.
Configuration Step-by-Step
I will assume that you have already prepared all the necessary PortGroups for your vCenter HA configuration, so I will start from the point where you need to add an interface to the Primary vCenter appliance.
Prepare vCenter VM
1. Manually add a Second vNIC to Primary vCenter appliance VM. Make sure the type is VMXNET3.
2. Log in to VAMI interface of Primary vCenter appliance and configure HA Network IP (do not specify any Gateway IP on this Interface)
3. Add temporary A record to your DNS. A record for you vCenter Server Domain Name pointing to the Management IP of Secondary vCenter appliance. Without this your vCenter HA wizard will drop and error.
Go through the vCenter HA wizard
4. Now, go to vCenter HA settings and click Configure. In the Configure vCenter HA wizard select the Advanced option, click Next and on Connection IP setting page click Advanced for Passive Node IP configuration.
5. Provide all the IP Information for HA interface, make sure you select the Override management network upon failure checkbox and provide IP details for Management interface of Passive Node. Once done click OK.
6. Back on Connection IP setting page, provide HA interface IP details for Witness Node. Click Next, wait for Validation to finish
7. On Clone VM page do not click Finish! Minimize the window to the Work in Progress pane.
Clone the VMs
8. Now we can proceed with Cloning the vCenter VM, but before doing that remove the DNS record you added in Step 3.
On vCenter 6.7 make sure shell is enabled on vCenter appliance and has enough timeout set. Might be a good idea to set it to one day for the period of cloning.
9. Right-Click your vCenter Appliance VM, select Clone>Clone to Virtual machine.
>a. Provide VM Name for Passive Node, click Next.
>b. Place the VM on Management Cluster on Site 2.
>c. Select Datastore to hold the VM.
>d. On Select clone options page, check the Customize the Operating system checkbox, and make sure Customize this Virtual Machine’s hardware and Power on virtual machine after creation checkboxes are not checked.
>e. Click Next.
>f. Create new Customization Specification for Passive Node.
>> i. Hostname should be same with Primary vCenter. Make sure to add the Domain Name.
>> ii. Even though the Passive node is in different site, make sure to select same Timezone with Primary vCenter Server.
>> iii. Modify the NIC card IP settings. One NIC for Management traffic (specify GW), second NIC for HA traffic (do not specify GW).
>> iv. Provide DNS server details.
>> v. Once done click Next and Finish.
>g. Back on Clone Existing Virtual machine wizard windows, select the created Customization Specification and click Next and Finish.
Note: Sometimes, although the Customize this Virtual Machine’s hardware is unchecked, for whatever reason it gets checked automatically after you click Next. In this case go back and uncheck it again.
10. Wait for the cloning to complete. Do not start the VM yet. Edit the VM settings and fix the vNIC to PortGroup associations. As you can see in the screenshots those will be blank after cloning. Make sure Connected checkbox for vNICs is selected.
11. Once NICs are associated with proper PortGroups, start the VM. Wait for it to boot and make sure correct IPs are assigned.
12. Clone Primary vCenter appliance VM once again, this time to be used as Witness Node. Process is mostly the same with Cloning of Secondary Node. There are some exceptions though.
>a. Provide VM Name for Witness Node, click Next.
>b. Place the VM on Witness Cluster on Site 3.
>c. Select Datastore to hold the VM.
>d. On Select clone options page, check the Customize the Operating system checkbox and make sure Customize this Virtual Machine’s hardware and Power on virtual machine after creation checkboxes are not checked.
>e. Click Next.
>f. Create new Customization Specification for Witness Node.
>> i. Hostname should be same with Primary vCenter. Make sure to add the Domain Name.
>> ii. Even though the Witness Node is in different site, make sure to select same Timezone with Primary vCenter Server.
>> iii. There is no need to configure Management NIC for Witness Node. Configure only second NIC for HA traffic (do not specify GW).
>> iv. Provide DNS server details.
>> v. Once done click Next and Finish.
>g. Back on Clone Existing Virtual machine wizard windows, select the created Customization Specification and click Next and Finish.
13. Once Cloning is complete, do not start the VM. Same as with Passive Node, fix the vNIC to PortGroup association, and optionally decrease the CPU and RAM allocation for Witness Node.
Fix the routing
14. Once all Nodes are up, we need log-in to each and add static routes so each Node can reach its partners. Open a console, or SSH to each node, switch to linux shell and add static routes by editing the needed config files. Below, there is an example from my configuration.
>a. On Primary Node, edit the /etc/systemd/network/10-eth1.network file and add the following lines.
# Static route to HA interface of Secondary node [Route] Gateway=10.72.143.249 Destination=10.45.84.170/32 # Static route to HA interface of Witness Node [Route] Gateway=10.72.143.249 Destination=10.164.78.66/32
>b. On Secondary Node edit the /etc/systemd/network/10-eth1.network file and add the following lines
# Static Route to Primary Node [Route] Gateway=10.45.84.169 Destination=10.72.143.250/32 # Static Route to Witness Node [Route] Gateway=10.45.84.169 Destination=10.164.78.66/32
>c. On Witness Node edit the /etc/systemd/network/10-eth1.network file and add the following lines.
# Static route to Primary Node [Route] Gateway=10.164.78.65 Destination=10.72.143.250/32 # Static Route to Secondary Node [Route] Gateway=10.164.78.65 Destination=10.45.84.170/32
>d. Once the config files are edited restart networking service for the configuration changes to take effect by running:systemctl restart systemd-networkd
Test the communication to make sure Nodes can access each other on the HA network.
Finalize the Setup
15. Maximize Configure vCenter HA wizard which we minimized in Step 7 and click Finish. Monitor the process and in several minutes you should see an image similar to the one bellow.
Failover
It is important to note that to perform full failover to Secondary Node in this scenario you will have to rewrite DNS record for vCenter server every time. This can of course be automation but it can also can be performed manually.
Closing words
vCenter HA over Layer 3 and NSX-v Manager Cross Site failover solutions can be used in conjunction, which will basically allow you to failover your management components during a site failure. Hope this helps.
Once again, special thanks to Jack Cherkas (@jackcherkas) for helping with this article.
P.S. I sometimes use some terms interchangeably, so to clarify:
Passive Node = Secondary Node
Active Node = Primary Node
Latest posts by Aram Avetisyan (see all)
- Make Youtube Videos About Technology? Why not… The Cross-Cloud Guy - October 7, 2021
- Automating (NSX-T) REST API using Ansible URI module - December 29, 2020
- Quick Reference: Create Security Policy with Firewall Rules using NSX-T Policy API - May 4, 2020
Good article Aram.
for me cloning fails, the cloned vcsa throws some error, services can not start – do you have any idea what could be the problem?
thanks for the good article.
Hi, Not really, hard to say without any details. are you performing all the needed steps, customisation and such? what is the service name which is not able to start?
ok i now see that the vcenter kind of gets unaivailable at the removing snapshots part of the cloning, and i kind of think this could be the cause, but how do i fix this problem?
Again hard to say. Cloning might fail due to many reasons. have you tried cloning another VM to see if the problem is general? Also, i did a fast google, and found some topics where it describes that cloning might fail if you use same datastore as destination for clone, especially if it’s NFS. May be you should raise this question on VMware communities or on reddit to get more opinions.
okay thank you for your help, i will try to figure out if there are generell problems with the cloning as you mentioned
You recommande on step 3.” Add temporary A record to your DNS. ”
When should this DNS record be deleted?
As it is written, in Step 8.
when cloning for passive, for nic0 what ip to assign ?IP address of nic0 of vcenter or management ip address of passive ?
I used passive management IP same which I mentioned as failover IP when deploying vcenter ha, but got in error, hostname for does not match with PNID of vcenter. any suggestions? PNID is FQDN but vcenter hostname is short name.
In step 12 > f >> iii, i specify the following “>>iii. There is no need to configure Management NIC for Witness Node. Configure only second NIC for HA traffic (do not specify GW).”
Basically ignore first nic on Witness host, configure only second one for HA Network
Hope this helps.
Thanks! but I am asking about passive node.step 9 >f>iii
Sorry, got confused. Have you added Temporary DNS record pointing to IP of Secondary Appliance like described in step 3?
3. Add temporary A record to your DNS. A record for you vCenter Server Domain Name pointing to the Management IP of Secondary vCenter appliance.
nvm, I got it working, Thanks for the write up.
Thanks for posting this article. I have a similar setup that I’m trying to deploy VCHA across but unable to get my HA network to talk to each other via PING. Thinking its my VDS/port group configuration.
Do you have any posts or reference for how your Networking and Port Groups are configured for each site?
I’m getting lost on how each site connects to each other being on separate vDS and port groups. Most of the guides show VCHA in a single site or subnet.
Hi Aaron,
There is nothing specific on dvPortgroup level. The communication will be going over physical network, so if you are not able to communicate over HA network, first make sure you have all the needed static routes as covered in “Fix the Routing” section and second make sure there is nothing blocking that communication on physical network (e.g. Proper routing is in place, no firewall blocking the communication, etc.)
Looks like I had a route config issue on my physical network, thanks!
Your notes about the temporary DNS entry and having to set a long timer for bash shell timeout on 6.7 really helped me out. I was banging my head against HA deployment until I came across this. Appreciate the post, Aram!
Glad it was useful 🙂
It is a really good post Aram..! thanks for the detail information..!
Really great article. So happy someone else needed cross-WAN (using management network) failover solution…thank you!
Hey Jack, just before you implement this in production consult VMware. they have changed their support stance in vSphere 7, and they do not recommend doin this any more.