In this article I would like to show you how to replace failed HDD (or SSD) drive in VMware vSAN cluster running on Softlayer Cloud.
1. Open web-client and navigate to: Hosts and Clusters -> choose Cluster -> Manage -> Settings -> Virtual SAN -> Disk Management
You will see all ESXi hosts in the vSAN enabled cluster and all vSAN disk groups.
2. Choose Disk group, identify failed HDD
3. Remove disk from diskgroup.
Most probably Migration mode “Full data migration” will not work as disk is failed and data can’t be read. So you need to choose “No data migration”.
Wait until failed disk disappear from the list in disk group.
4. Login to ESXi host by SSH
Enter the command: /opt/lsi/storcli/storcli /c0 show
(you need to have storcli installed on the ESXi host, download it from here, find “Latest MegaRAID Storcli” and install StorCLI)
You will get something like below. I’m skipping some non-important info here. You need “VD LIST” and “PD LIST” :
[root@hostname:~] /opt/lsi/storcli/storcli /c0 show Generating detailed summary of the adapter, it may take a while to complete. [...] VD LIST : ======= ------------------------------------------------------------------- DG/VD TYPE State Access Consist Cache Cac sCC Size Name ------------------------------------------------------------------- 0/0 RAID1 Optl RW Yes RWBD - ON 931.0 GB RAID1-A 1/1 RAID0 Optl RW Yes NRWTD - ON 744.687 GB VSAN-SSD 2/2 RAID0 Optl RW Yes RWTD - ON 1.818 TB VSAN 3/3 RAID0 Optl RW Yes RWTD - ON 1.818 TB VSAN 4/4 RAID0 OfLn RW No RWTD - ON 1.818 TB VSAN 5/5 RAID0 Optl RW Yes RWTD - ON 1.818 TB VSAN ------------------------------------------------------------------- [...] PD LIST : ======= ------------------------------------------------------------------------------------- EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type ------------------------------------------------------------------------------------- 8:0 9 Onln 0 931.0 GB SATA HDD N N 512B ST1000NM0033-9ZM173 U - 8:1 20 Onln 0 931.0 GB SATA HDD N N 512B ST1000NM0033-9ZM173 U - 8:2 11 Onln 1 744.687 GB SATA SSD N N 512B INTEL SSDSC2BA800G4 U - 8:3 16 Onln 2 1.818 TB SATA HDD N N 512B WDC WD2000FYYZ-01UL1B2 U - 8:4 10 Onln 3 1.818 TB SATA HDD N N 512B WDC WD2000FYYZ-01UL1B2 U - 8:5 18 Failed 4 1.818 TB SATA HDD N N 512B - U - 8:6 12 Onln 5 1.818 TB SATA HDD N N 512B WDC WD2000FYYZ-01UL1B2 U - ------------------------------------------------------------------------------------- [...]
5. Identify the failed raid group in VD LIST. In this case it’s 4/4 in VD LIST and 8:5 in PD LIST.
6. Verify once again if you have chosen the correct VD:
/opt/lsi/storcli/storcli /c0/v4 show
[root@hostname:~] /opt/lsi/storcli/storcli /c0/v4 show Virtual Drives : ------------------------------------------------------------- DG/VD TYPE State Access Consist Cache Cac sCC Size Name ------------------------------------------------------------- 4/4 RAID0 OfLn RW No RWTD - ON 1.818 TB VSAN -------------------------------------------------------------
7. Now delete the failed raid:
/opt/lsi/storcli/storcli /c0/v4 del
[root@hostname:~] /opt/lsi/storcli/storcli /c0/v4 del Controller = 0 Status = Success Description = Delete VD succeeded
8. Open the ticket with Softlayer support for HW failure and paste the evidence:
PD LIST :
=======
————————————————————————————
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type
————————————————————————————
8:0 9 Onln 0 931.0 GB SATA HDD N N 512B ST1000NM0033-9ZM173 U –
8:1 20 Onln 0 931.0 GB SATA HDD N N 512B ST1000NM0033-9ZM173 U –
8:2 11 Onln 1 744.687 GB SATA SSD N N 512B INTEL SSDSC2BA800G4 U –
8:3 16 Onln 2 1.818 TB SATA HDD N N 512B WDC WD2000FYYZ-01UL1B2 U –
8:4 10 Onln 3 1.818 TB SATA HDD N N 512B WDC WD2000FYYZ-01UL1B2 U –
8:5 18 UBad – 1.818 TB SATA HDD N N 512B – U –
8:6 12 Onln 4 1.818 TB SATA HDD N N 512B WDC WD2000FYYZ-01UL1B2 U –
————————————————————————————
9. Wait for confirmation that the disk was replaced.
10. Do rescan Storage and Refresh. Sometimes ESXi host reboot is required to get correct information about replaced HDD.
11. In case VD is recreated automatically during ESXi host reboot, remove it. Use commands in steps 5-7 (check which VD number you are going to remove and use the appropriate syntaxes).
12. Now Run: /opt/lsi/storcli/storcli /c0 show
Finally PD LIST should be like this (replaced drive is 8:5):
PD LIST :
=======
————————————————————————————
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type
————————————————————————————
8:0 9 Onln 0 931.0 GB SATA HDD N N 512B ST1000NM0033-9ZM173 U –
8:1 20 Onln 0 931.0 GB SATA HDD N N 512B ST1000NM0033-9ZM173 U –
8:2 11 Onln 1 744.687 GB SATA SSD N N 512B INTEL SSDSC2BA800G4 U –
8:3 16 Onln 2 1.818 TB SATA HDD N N 512B WDC WD2000FYYZ-01UL1B2 U –
8:4 10 Onln 3 1.818 TB SATA HDD N N 512B WDC WD2000FYYZ-01UL1B2 U –
8:5 18 UGood – 1.818 TB SATA HDD N N 512B WDC WD2000FYYZ-01UL1B2 U –
8:6 12 Onln 4 1.818 TB SATA HDD N N 512B WDC WD2000FYYZ-01UL1B2 U –
————————————————————————————
13. Create new VD.
In case of HDD (cache mode RWTD):
/opt/lsi/storcli/storcli /c0 add vd type=raid0 name=VSAN drive=8:5 ra wt direct strip=256
In case of SSD (cache mode NRWTD):
/opt/lsi/storcli/storcli /c0 add vd type=raid0 name=VSAN drive=8:5 nora wt direct strip=256
14. If everything was done correctly you should get similar result (VD 5/4 and PD 8:5).
/opt/lsi/storcli/storcli /c0 show
VD LIST : ======= ------------------------------------------------------------------- DG/VD TYPE State Access Consist Cache Cac sCC Size Name ------------------------------------------------------------------- 0/0 RAID1 Optl RW Yes RWBD - ON 931.0 GB RAID1-A 1/1 RAID0 Optl RW Yes NRWTD - ON 744.687 GB VSAN-SSD 2/2 RAID0 Optl RW Yes RWTD - ON 1.818 TB VSAN 3/3 RAID0 Optl RW Yes RWTD - ON 1.818 TB VSAN 4/5 RAID0 Optl RW Yes RWTD - ON 1.818 TB VSAN 5/4 RAID0 Optl RW Yes RWTD - ON 1.818 TB VSAN ------------------------------------------------------------------- PD LIST : ======= ------------------------------------------------------------------------------------ EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type ------------------------------------------------------------------------------------ 8:0 9 Onln 0 931.0 GB SATA HDD N N 512B ST1000NM0033-9ZM173 U - 8:1 20 Onln 0 931.0 GB SATA HDD N N 512B ST1000NM0033-9ZM173 U - 8:2 11 Onln 1 744.687 GB SATA SSD N N 512B INTEL SSDSC2BA800G4 U - 8:3 16 Onln 2 1.818 TB SATA HDD N N 512B WDC WD2000FYYZ-01UL1B2 U - 8:4 10 Onln 3 1.818 TB SATA HDD N N 512B WDC WD2000FYYZ-01UL1B2 U - 8:5 18 Onln 5 1.818 TB SATA HDD N N 512B WDC WD2000FYYZ-01UL1B2 U - 8:6 12 Onln 4 1.818 TB SATA HDD N N 512B WDC WD2000FYYZ-01UL1B2 U - ------------------------------------------------------------------------------------
15. Now you can add new HDD/SSD to vSAN Disk group.
Yevgeniy Steblyanko
Latest posts by Yevgeniy Steblyanko (see all)
- Automate bulk Windows and Linux VMs creation from template with Guest OS customization - July 18, 2024
- VMware NSX VPN tunnels statistics collection with PowerShell - February 29, 2024
- vSphere & NSX: Active Directory over LDAPs authentication - February 3, 2022