vCenter HA over Layer 3 Network: Step-by-Step Guide

Share this:

Intro

To clarify, vCenter HA over Layer 3 means that vCenter Active, Passive and Witness nodes will be running on different subnets. And I am not talking about the HA network, I am taking about the Management network. This of course means that you will need to rewrite your DNS record after the failover but more about that later on in this article.

The topic of vCenter HA is more or less covered on the web. VMware has a KB about this configuration. My friend David (@david_pasek) also wrote an article about this. Problem is that none of those articles worked as an end to end deployment manual for me. So let me just cover which steps I took to make the configuration work and it might be helpful for someone else.

Scope

So, as I have covered in my yesterday’s article, we have a Stretched Cross-Site environment, with dedicated Management and Edge clusters in each site and a Stretched vSAN cluster for Compute (customer workloads). There is also a third site which hosts witness components. This all is managed by a single vCenter. 

Although I was running vSphere 6.5, to utilise the full potential of vCenter HA, I am using embedded PSC (which is recommended as part of vSphere 6.5 U2). Here is how my HA config looks like.

Configuration Step-by-Step

I will assume that you have already prepared all the necessary PortGroups for your vCenter HA configuration, so I will start from the point where you need to add an interface to the Primary vCenter appliance.

Prepare vCenter VM

1. Manually add a Second vNIC to Primary vCenter appliance VM. Make sure the type is VMXNET3.

2. Log in to VAMI interface of Primary vCenter appliance and configure HA Network IP (do not specify any Gateway IP on this Interface)

3. Add temporary A record to your DNS. A record for you vCenter Server Domain Name pointing to the Management IP of Secondary vCenter appliance. Without this your vCenter HA wizard will drop and error.

Go through the vCenter HA wizard

4. Now, go to vCenter HA settings and click Configure. In the Configure vCenter HA wizard select the Advanced option, click Next and on Connection IP setting page click Advanced  for Passive  Node IP configuration.

5. Provide all the IP Information for HA interface, make sure you select the Override management network upon failure checkbox and provide IP details for Management interface of Passive Node. Once done click OK.

6. Back on Connection IP setting page, provide HA interface IP details for Witness Node. Click Next, wait for Validation to finish

7. On Clone VM page do not click Finish! Minimize the window to the Work in Progress pane.

Clone the VMs

8. Now we can proceed with Cloning the vCenter VM, but before doing that remove the DNS record you added in Step 3.

On vCenter 6.7 make sure shell is enabled on vCenter appliance and has enough timeout set. Might be a good idea to set it to one day for the period of cloning.

9. Right-Click your vCenter Appliance VM, select Clone>Clone to Virtual machine.
>a. Provide VM Name for Passive Node, click Next.
>b. Place the VM on Management Cluster on Site 2.
>c. Select Datastore to hold the VM.
>d. On Select clone options page, check the Customize the Operating system checkbox, and make sure Customize this Virtual Machine’s hardware and Power on virtual machine after creation checkboxes are not checked.
>e. Click Next.

>f. Create new Customization Specification for Passive Node.
>> i. Hostname should be same with Primary vCenter. Make sure to add the Domain Name.
>> ii. Even though the Passive node is in different site, make sure to select same Timezone with Primary vCenter Server.
>> iii. Modify the NIC card IP settings. One NIC for Management traffic (specify GW), second NIC for HA traffic (do not specify GW).
>> iv. Provide DNS server details.
>> v. Once done click Next and Finish.

>g. Back on Clone Existing Virtual machine wizard windows, select the created Customization Specification and click Next and Finish.

Note: Sometimes, although the Customize this Virtual Machine’s hardware is unchecked, for whatever reason it gets checked automatically after you click Next. In this case go back and uncheck it again.

10. Wait for the cloning to complete. Do not start the VM yet. Edit the VM settings and fix the vNIC to PortGroup associations. As you can see in the screenshots those will be blank after cloning. Make sure Connected checkbox for vNICs is selected.

11. Once NICs are associated with proper PortGroups, start the VM. Wait for it to boot and make sure correct IPs are assigned.

12. Clone Primary vCenter appliance VM once again, this time to be used as Witness Node. Process is mostly the same with Cloning of Secondary Node. There are some exceptions though.
>a. Provide VM Name for Witness Node, click Next. 
>b. Place the VM on Witness Cluster on Site 3
>c. Select Datastore to hold the VM. 
>d. On Select clone options page, check the Customize the Operating system checkbox and make sure Customize this Virtual Machine’s hardware and Power on virtual machine after creation checkboxes are not checked. 
>e. Click Next.
>f. Create new Customization Specification for Witness Node.
>> i. Hostname should be same with Primary vCenter. Make sure to add the Domain Name.
>> ii. Even though the Witness Node is in different site, make sure to select same Timezone with Primary vCenter Server.
>> iii. There is no need to configure Management NIC for Witness Node. Configure only second NIC for HA traffic (do not specify GW).
>> iv. Provide DNS server details.
>> v. Once done click Next and Finish.
>g. Back on Clone Existing Virtual machine wizard windows, select the created Customization Specification and click Next and Finish.

13. Once Cloning is complete, do not start the VM. Same as with Passive Node, fix the vNIC to PortGroup association, and optionally decrease the CPU and RAM allocation for Witness Node.

Fix the routing

14. Once all Nodes are up, we need log-in to each and add static routes so each Node can reach its partners. Open a console, or SSH to each node, switch to linux shell and add static routes by editing the needed config files. Below, there is an example from my configuration.

>a. On Primary Node, edit the /etc/systemd/network/10-eth1.network file and add the following lines.

# Static route to HA interface of Secondary node
[Route]
Gateway=10.72.143.249
Destination=10.45.84.170/32

# Static route to HA interface of Witness Node
[Route]
Gateway=10.72.143.249
Destination=10.164.78.66/32

>b. On Secondary Node edit the /etc/systemd/network/10-eth1.network file and add the following lines

# Static Route to Primary Node
[Route]
Gateway=10.45.84.169
Destination=10.72.143.250/32

# Static Route to Witness Node
[Route]
Gateway=10.45.84.169
Destination=10.164.78.66/32

>c. On Witness Node edit the /etc/systemd/network/10-eth1.network file and add the following lines.

# Static route to Primary Node
[Route]
Gateway=10.164.78.65
Destination=10.72.143.250/32

# Static Route to Secondary Node
[Route]
Gateway=10.164.78.65
Destination=10.45.84.170/32

>d. Once the config files are edited restart networking service for the configuration changes to take effect by running:
systemctl restart systemd-networkd
Test the communication to make sure Nodes can access each other on the HA network.

Finalize the Setup

15. Maximize Configure vCenter HA wizard which we minimized in Step 7 and click Finish. Monitor the process and in several minutes you should see an image similar to the one bellow.

Failover

It is important to note that to perform full failover to Secondary Node in this scenario you will have to rewrite DNS record for vCenter server every time. This can of course be automation but it can also can be performed manually.

Closing words

vCenter HA over Layer 3 and  NSX-v Manager Cross Site failover  solutions can be used in conjunction, which will basically allow you to failover your management components during a site failure. Hope this helps.

Once again, special thanks to Jack Cherkas (@jackcherkas) for helping with this article.

P.S. I sometimes use some terms interchangeably, so to clarify:
Passive Node = Secondary Node
Active Node = Primary Node

The following two tabs change content below.
Aram Avetisyan is an IT specialist with more than 18 years experience. He has rich background in various IT related fields like Cloud, Virtualization and SDN. He holds several industry level certifications including but not limited to VCIX-DCV, VCIX-NV. He is also a vEXPERT in years 2014-2021.

About Aram Avetisyan

Aram Avetisyan is an IT specialist with more than 18 years experience. He has rich background in various IT related fields like Cloud, Virtualization and SDN. He holds several industry level certifications including but not limited to VCIX-DCV, VCIX-NV. He is also a vEXPERT in years 2014-2021.
Bookmark the permalink.

22 Comments

  1. for me cloning fails, the cloned vcsa throws some error, services can not start – do you have any idea what could be the problem?

    thanks for the good article.

    • Hi, Not really, hard to say without any details. are you performing all the needed steps, customisation and such? what is the service name which is not able to start?

      • ok i now see that the vcenter kind of gets unaivailable at the removing snapshots part of the cloning, and i kind of think this could be the cause, but how do i fix this problem?

        • Again hard to say. Cloning might fail due to many reasons. have you tried cloning another VM to see if the problem is general? Also, i did a fast google, and found some topics where it describes that cloning might fail if you use same datastore as destination for clone, especially if it’s NFS. May be you should raise this question on VMware communities or on reddit to get more opinions.

          • okay thank you for your help, i will try to figure out if there are generell problems with the cloning as you mentioned

  2. You recommande on step 3.” Add temporary A record to your DNS. ”

    When should this DNS record be deleted?

  3. when cloning for passive, for nic0 what ip to assign ?IP address of nic0 of vcenter or management ip address of passive ?

  4. I used passive management IP same which I mentioned as failover IP when deploying vcenter ha, but got in error, hostname for does not match with PNID of vcenter. any suggestions? PNID is FQDN but vcenter hostname is short name.

    • In step 12 > f >> iii, i specify the following “>>iii. There is no need to configure Management NIC for Witness Node. Configure only second NIC for HA traffic (do not specify GW).”

      Basically ignore first nic on Witness host, configure only second one for HA Network

      Hope this helps.

  5. aaronskight@msn.com'
    Aaron Kightlinger

    Thanks for posting this article. I have a similar setup that I’m trying to deploy VCHA across but unable to get my HA network to talk to each other via PING. Thinking its my VDS/port group configuration.

    Do you have any posts or reference for how your Networking and Port Groups are configured for each site?

    I’m getting lost on how each site connects to each other being on separate vDS and port groups. Most of the guides show VCHA in a single site or subnet.

    • Hi Aaron,
      There is nothing specific on dvPortgroup level. The communication will be going over physical network, so if you are not able to communicate over HA network, first make sure you have all the needed static routes as covered in “Fix the Routing” section and second make sure there is nothing blocking that communication on physical network (e.g. Proper routing is in place, no firewall blocking the communication, etc.)

  6. Your notes about the temporary DNS entry and having to set a long timer for bash shell timeout on 6.7 really helped me out. I was banging my head against HA deployment until I came across this. Appreciate the post, Aram!

  7. praseethkumar75@gmail.com'
    Praseeth Kalleri Thazha

    It is a really good post Aram..! thanks for the detail information..!

  8. Really great article. So happy someone else needed cross-WAN (using management network) failover solution…thank you!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.