If you decided to go with disk mirroring inside your guest operating system across two SAN LUNs for whatever reason (e.g. you want to bring another level of the protection for your application and your SAN environment does not support “stretch cluster” implementation, no support for synchronous mirroring, budget constrains etc.).
There are special configuration parameters which you should add into your virtual machine configuration file (*.vmx) to have your setup working correctly and fault tolerant (from storage perspective).
It will work even without applying them, however in case of a failure of the one storage device you won’t have desired behavior (Your failed disk will be running in memory, so VM will be somehow “frozen”).
Note: source vSphere Storage Guide. I would say the same applies for mirroring inside Linux VMs as well.
Prerequisites
Use a Windows virtual machine that supports dynamic disks.
Required privilege: Advanced
Procedure
- Create a virtual machine with two virtual disks. Make sure to place the disks on different datastores/arrays.
- Log in to your virtual machine and configure the disks as dynamic mirrored disks. See Microsoft documentation.
- After the disks synchronize, power off the virtual machine.
- Change virtual machine settings to allow the use of dynamic disk mirroring.
a. Right-click the virtual machine and select Edit Settings.
b. Click the VM Options tab and expand the Advanced menu.
c. Click Edit Configuration next to Configuration Parameters.
d. Click Add Row and add the following parameters:
Name Value scsi#.returnNoConnectDuringAPD
True scsi#.returnBusyOnNoConnectStatus False - Click OK.
Latest posts by Dusan Tekeljak (see all)
- VM Latency Sensitivity set to High still fails with no (proper) warning - June 27, 2024
- ESXi 6.7 U1 fixes: APD and VMCP is not triggered even when no paths can service I/Os - November 30, 2018
- Update manager error: hosts could not enter maintenance mode - November 19, 2018
Hi Dusan,
will this work also when one of virtual disks is inaccessible? We have issues when we simulate storage failure (removed host mappings for DS LUNS at storage level). The goal is to have aoutomated failover for mission critical VMs
ESXi is not going to startup VM since one of virtual disks (-fkat.vdmk files) is not accessible. We have to change VM config manually and start it there after
Hi Pavol,
If I understand this correctly you have inguest mirroring active and you want to test scenario when one site went down completely and HA has to be triggered – VM had to be rebooted at another site.
Never tested this scenario by myself like that, but I would say it will just not boot. Mentioned commands (based syntax) are only sending different signal to the VM when APD will occur, I really doubt it will force VMware to skip vmdk accessibility test during VM power on event. I would recommend you to ask VMware directly how to force VM to boot even with inaccessible vmdk’s, I do believe they do have a special attribute for this 😉
Let me know the result please. If you don’t have VMware support, just let me know as well. This question is actually quite good and I’m really interested to know how to approach it, so I can ask them by myself, but it can take some time
Hi Dusan,
Yes, your understend is proper. We dindn’t simulate only APD, but also ESX fail resuliting into SITE DOWN scenario. Thanx for correcting and I’m happy you got it.
OK we will try to resolve this issue with VMWARE. We have metro mirror at SVC level, but not Site Recovery Manager (some internal experts think it is too complex too manage), so no automatic failover possible. Also we do not have stretched cluster (SVC stretched cluster does not have CG etc.) and we are only looking forward to SVC Hhyperswap introduced in 7.5 ( I Know you have perfect understading of SVC).
But according to your post… If we set returnNoConnectDuringAPD and returnBusyOnNoConnectStatus our VM should continue working in case of APD event (of course, the ESX would still be running).
This is very important, becuase we need to solve automatick failover in case of ESX failure or APD event one one storage. The business accepts there will be manual intervention in case of heavy disaster like SITE DOWN.
Regards
Pavol
Hi, “too complex to manage” well there are some reasons why not to use SRM, but this one is comming from “expert”? 😀 Lazy or unskilled one… Maybe true if you need to protect 1 VM (you didn’t mention the number, but let’s assume at least 20). Trust me it will be more complex eventually to manage in guest mirroring in the large number of the VMs. Honestly I don’t even know if such configuration is supported by the OS vendors, although technically possible, indeed. My post was meant to be for a small business environment not someone who can afford SVC 😉 in my opinion certainly not to be used in enterprise. However if you do require to have a transparent failover, without stretch cluster(metro mirror), I’m afraid there is not much you can do, at least nowadays. BTW fault tolerance in vsphere 6 is doing also storage mirroring, but you are limited by the number of the VMs per host, quite a lot…
Hi Dusan,
well, it was also from VMWARE conslutants who have used SRM with SVC in different company (friendly company :)). They told us SRM was trouble after every single SVC upgrade (they had SVC as well). From my point this is quite strange, as most clusters (aka PowerHA) are just using SSH commands to control Metro Consistency Groups, so there shouldn’t be many issues regarding upgrades.
I’m not an VMWARE expert, nor VMWARE admin, I’m only interested due quite many issues with our comapny. From my perspective, I love host based mirroring with IBM AIX and PowerHA 7.1.3 SE. Metro is also great, the only thing which I do not like as much is linked cluster topology with PowerHA EE. (and unfortunately you can;t have basic cluster topolgy with Metro control)..
Obviously, host based mirroring is not the best option in VMWARE environment. I’d also go for the SRM. Or for the HYPERSWAP in the furutre (unfortunately we have two 8nodes cluster on each site so we will have to wait for REMOTE HYPERSWAP, local is nto a solution at this moment).
Hope we will discuss more tomorrow, maybe you will be slightly surprised 🙂
Hi, thanks for this post – really helped. Just a small point – it might be helpful to note that in the “iSCSI#:…” lines, the “#” represents the SCSI device number – don’t just use a “#”! In my case it was “iSCSI3:…”
Richard
Hello Richard, glad it helped and you figured it out, lets hope your comment would be explanatory enough for the others as well 😉 Btw you kinda injected”i”SCSI which is obviously mistake