Let me just get this out of the way – I’m a HUGE fan of VSAN (aka VMware Virtual SAN). I was first in line to drink the kool-aid when VSAN was nothing but a “what if…?”. Fast forward to the present — VSAN beta (refresh) is backing my entire lab. I’m tweaking, testing, breaking (learning), and sharing my thoughts on VSAN’s capabilities, performance, and benefits ahead of the official launch. This is all in good order because even the beta has exceeded my expectations in what VMware would ship as a 1.0 product.
I can write page after page about the ins-and-outs of VSAN, but fortunately several very respected individuals have already done so. For starters, Duncan Epping at yellow-bricks.com not only is a massive contributor to the cause, but has also put together a nice list of VSAN resources from around the web that is a must-see. But lets face it, if you’re tracking VSAN you’ve probably already been there, done that 🙂 So for this post, I’m going to focus instead on my VSAN home lab build and experiences thus far. I’ve shared several preliminary stats on twitter (here, here, and here) ahead of any tweaking and will be sure to post additional results as I play with things a bit more.
EZLAB (“EZ” after El-Zein in case you were wondering) has been through somewhat of an overhaul. My original lab was mostly whitebox and was everything I needed at the time, but to play in the home lab big leagues I needed to make some modest investments.
Here’s a logical / solutions overview of the current state of “EZLAB”…
EZLAB Logical Architecture |
Physical Lab Specs
Compute:
2 x Dell PowerEdge R710 2U
2 x Dell PowerEdge R610 1U
System Configuration (per host):
– 2 x 4-core Intel E5620 @ 2.4Ghz
– 64GB Memory
– PERC 6/i RAID Controller, no JBOD 🙁
– 1 x Samsung 840 Pro 256GB SSD
– 3 x WD Black 7.2k RPM, 750GB 2.5″ SATA HD
– 1 x 4PT Broadcom 1Gbps nic
– 1 x 4PT Intel 1Gbps nic
– 1 x 2PT Infiniband 4X DDR HCA @ 10gbps*
* not yet implemented
Network:
Brocade FCX 24pt 1Gbps + 2pt 10Gbps Switch
Brocade FCX 48pt 1Gbps + 2pt 10Gbps Switch
Qlogic SilverStorm 24pt Infiniband Edge (9024-CU24-ST2)*
* not yet implemented
Storage:
Synology 1511+, 5 x Crucial 128GB SSD + 5 x WD Black 1TB SATA
VMware Virtual SAN (!!)
The Synology 1511+ has been the primary storage solution for a couple years now…and it met all my needs for a small environment. The 5 x SSD bay definitely contributed to that. However, with the recent VSAN upgrades, the Synology has taken a back seat of sorts, now providing NFS-based datastores for the nested vESXi hosts in the cloud cluster (at least until I move VSAN into there as well). It is also utilized as primary backup, media/file server, VPN, OpenLDAP, etc. I love this unit, so I don’t think I’ll be retiring it anytime soon. But needless to say, VSAN has taken over as my primary storage fabric. Speaking of VSAN…
VSAN Config
As I’m sure you’ve heard/witnessed by now, VSAN is a breeze to configure. Once your disks are online and visible to vSphere, you enable VSAN traffic on the appropriate vmk interface then enable VSAN in cluster settings. Again, there are so many resources out there that will step you through getting started, the configuration, in-depth details, design considerations, FAQ’s, other deployment options, and troubleshooting with VSAN Observer…so i’ll spare you those details. This post is not intended to be a how-to guide.
Configuring VSAN in my lab was incredibly straight forward. There are currently 3 hosts participating in my VSAN cluster…the 4th (R610) is down for maintenance at the moment, but I will be adding it to the mix very shortly.
UPDATE: 4th host was added in a follow-up post, Scaling VSAN: Adding a New VSAN Host
Network Configuration
Each of my ESXi hosts are configured with a dedicated vSwitch, a dedicated storage vmk (enabled for VSAN), and 2 physical 1Gbps uplinks (active/standby) for all storage traffic. This will soon be replaced with the infiniband fabric, which will add dual 10Gbps HCA’s per host. Although VSAN supports both 1Gbps and 10Gbps networks, 10Gbps is highly recommended for scalability and performance. I opted for infiniband to keep costs down after reading Erik Bussink’s infiniband post.
VSAN Settings
With the cluster selected, browse to Manage tab –> Settings –> General to enable and configure VSAN. I used VSAN’s manual configuration option to give me full control of which disks are used for the cluster. You can also opt to use the “Automatic” option, which automatically claims and consumes all empty/available local disks for VSAN. A minimum of 1 SSD and at least 1 mechanical disk are required per disk group. In my case I used all the host’s available disks (1 SSD, 3 SATA) in a single disk group per host.
Enabling VSAN |
Each host is configured with a single 256GB SSD (used for 70% read / 30% write cache) and 3 x 750GB SATA drives. VSAN sizing best practice recommends a 10:1 HD:SSD capacity ratio so at 2.25TB per host, the 256GB SSD was right in line. This comes out to 6.13TB of total available capacity once all three hosts were added to the VSAN cluster (6.75TB minus ~9% overhead). The SSD capacity is utilized strictly for r/w cache and not used by the VSAN Datastore.
Once VSAN is enabled, the Disk Management section allows you to configure and manage new or existing Disk Groups by claiming available physical disk. Since I opted for the Manual configuration option, this is where I created each Disk Group (1 per host).
VSAN Disk Management |
The RAID-0 Caveat — my hosts each have a PERC 6/i disk controller, which does not support JBOD (a VSAN prerequisite). To make individual disks available to VSAN, I had to create a RAID0 volume for each physical disk. This does not significantly impact individual disk performance, but it will effect some of VSAN’s functionality, such as disk hot-swap. For example, with JBOD a failed disk can simply be replaced in a powered-on state, while having to create a RAID0 will require a host reboot. In either scenario VSAN will manage rebuilding the disk group.
The other caveat with using RAID0 is SSD presentation — vSphere will not recognize the SSD drive in a RAID0 as a local SSD. The work-around is to ‘fool’ vSphere into thinking the SSD’s RAID0 volume is actually a native SSD drive. To do this, I had to SSH into each host and execute the following esxcli commands:
esxcli storage nmp satp rule add -s VMW_SATP_LOCAL -d naa.6842b2b006600b001a6b7e5a0582e09a -o enable_ssd
esxcli storage core claiming reclaim -d naa.6842b2b006600b001a6b7e5a0582e09a
The device “naa.6842b2b006600b001a6b7e5a0582e09a” is the device name that corresponds with the SSD drive (on host ezlab-esx05 in this example). This command was run on each host for the appropriate SSD drives. Once completed, the Drive Type will properly show SSD (no reboot necessary)…and only then can a VSAN Disk Group be created. In case you’re wondering…yes, this procedure can be done on a non-SSD drive for the sake of testing VSAN…but definitely not a recommended practice. The take-away here — go with JBOD.
VSAN Completed Configuration and Status |
You can check each Disk Group’s available capacity (per disk) and operational status in the cluster’s “Monitor” tab…
Monitoring VSAN’s Disk Groups |
Once enabled and properly configured, VSAN will create a single datastore “vsanDatastore” for the entire cluster. VSAN 1.0 will only support a single Datastore. To take advantage of VSAN and all it’s glory, next step was to migrate all my VM’s from the Synology to the new Datastore.
Storage Policy-Based Management (SPBM)
VSAN enforces various settings and policies by using the SPBM engine on a per-VM basis (or globally if desired). Cormac Hogan covers SPBM in detail as part of his VSAN series, which is a must read. All VM’s that live on a VSAN datastore are assigned either the default policy, or a user-specified one. It is recommended to create a policy of your own, even if it simply applies all the default settings. Once a policy is created, it is applied to the VM (to some or all of it’s attached disks/files) using the vSphere Web Client.
Applying a VM Storage Policy |
In exploring the different options of the SPBM, I created two policies that provide a balance of availability and performance – “High IO Apps” and “Non-Critical Apps”, which follows VSAN’s default policy. To understand the impact and/or benefits of each setting, make sure to read Cormac’s post or the VSAN beta documentation before continuing.
The “High IO Apps” policy is sort of ridiculous at the moment with the read cache reservation set to 100%. This is experimental as I’m gauging the impact of this setting on some high-IO apps.
EZLAB Storage Policy “High IO Apps” |
The “Non-Critical Apps” replicates the default policy and is applied to the majority of all VM’s in my environment. I can use these default settings as a starting point as I experiment with different options and understand the impact.
EZLAB Storage Policy “Non-Critical Apps” |
You can monitor policy status and VM disk/component placement using in vSphere Web Client by selecting the Cluster or VM in the left pane and selecting “Virtual SAN” from the “Monitor” tab.
Monitoring VSAN Virtual Disks |
Well there you have it folks…VSAN. One of the most impressive things with this whole experience was the ability to completely migrate to this platform without ever having to power down a VM…and a configuration that took minutes to complete.
Early testing has yielded very impressive results that compete with the performance I was getting out of the all-SSD array, but at a tiny fraction of the cost for 10X the capacity. I will continue to play with VSAN and SPBM as I fine-tune the lab (especially after the 10Gb fabric is installed!)…and I’ll be sure to share the results.
VSAN Iometer Test – 4k size, 100% Read, 100% Random, 2GB object on a single VM. |
Be sure to follow me on Twitter to see more test results as I get them out there.
**6.5 UPDATE: I have been playing with VSAN 6.5 for a few months now on 3 x Dell R710. I will say the VSAN on 6.5 seems to perform way better for me (than v6.0). I am sure this is due to the iSCSI ability with 6.5. I moved my dedicated VSAN NIC ports to dedicated iSCSI switches and wow what a difference! My other Storage is an iSCSI Dell EqualLogic PS Array that I need to upgrade (8 years old and still chugging). I have been holding off on getting a new array, to look at VSAN technology as I feel it is the future of enterprise storage. Also, I feel the big name arrays are way too costly for what you get, nowadays. In my setup, I’m running 72 Powered-On VMs and I can support loads more. I’m using vCenter 6.5 and all hosts have ESXi 6.5 Enterprise Plus, other than the 2 Cisco UCS Collaboration Servers that are still on 5.5. Going forward, I really like what Cisco has. I am considering building out another VSAN with a few Cisco UCS C240 M4 Server. Then adding a UCS Mini, an APIC and a pair of Nexus 9Ks then using Cisco Cloud Center for all of my management and orchestration, which should make my life easier!
My Setup:
Infrastructure:
– 1 x Dell PowerEdge 4220 Rack Enclosure
– 2 x Dell 2700W UPS Units 3U
– 2 x Dell PDU Units 0U (Side of Rack)
– 1 x APC Transfer Switch 1U (for devices w/ 1 PS)
– 1x APC NetBotz Rack Monitor 570 1U (environment monitoring device)
Compute:
– 3 x Dell PowerEdge R710 2U
– 2 x Cisco C220 M3 1U (For Cisco CUCM/Collab Only PUB & SUB Servers still in vCenter)
R710 System Configuration (per host):
– 2 x Zeon E5504 @ 2.0Ghz
– 288GB Memory
– PERC 6/i RAID Controller, RAID 0 each disc
– 6 x Dell Enterprise 2TB 2TB 3.5″ SATA 7.2K 6Gb/s HDD
– 1 x HGST Ultrastar SN150 3200GB SSD NVMe PCIe ( overkill)
– 1 x 4PT Broadcom BCM5709 1Gbps NIC
– 1 x 4PT Intel 82576 1Gbps NIC
– 1 x iDRAC Enterprise
– 1 x Internal SD Unit (8Gb ESXi 6.5 Dell image)
Network:
– 2 x Cisco 3750G Stack (Core Switches)
– 2 x Cisco 3750E (iSCSI Storage Switches)
– 1 x Cisco 2960S PoE Max (Access Switch)
– 1 x Cisco 3560 (Management Switch)
– 1 x Cisco 4331 ISR (Voice Gateway Router)
– 1 x Cisco 2951 ISR (Edge Router)
Security:
– 2 x Cisco ASA5515-X w/ FirePOWER
– 1 x Cisco S170 Web Security Appliance
– 1 x Cisco C170 Email Security Appliance
Storage:
Dell EqualLogic PS4000 // Dual Controller Modules 4 x 1Gbps // 16x 1TB Dell Enterprise HDD
**6.5 UPDATE: I have been playing with VSAN 6.5 for a few months now on 3 x Dell R710. I will say the VSAN on 6.5 seems to perform way better for me (than v6.0). I am sure this is due to the iSCSI ability with 6.5. I moved my dedicated VSAN NIC ports to dedicated iSCSI switches and wow what a difference! My other Storage is an iSCSI Dell EqualLogic PS Array that I need to upgrade (8 years old and still chugging). I have been holding off on getting a new array, to look at VSAN technology as I feel it is the future of enterprise storage. Also, I feel the big name arrays are way too costly for what you get, nowadays. In my setup, I’m running 72 Powered-On VMs and I can support loads more. I’m using vCenter 6.5 and all hosts have ESXi 6.5 Enterprise Plus, other than the 2 Cisco UCS Collaboration Servers that are still on 5.5. Going forward, I really like what Cisco has. I am considering building out another VSAN with a few Cisco UCS C240 M4 Server. Then adding a UCS Mini, an APIC and a pair of Nexus 9Ks then using Cisco Cloud Center for all of my management and orchestration, which should make my life easier!
My Setup:
Infrastructure:
– 1 x Dell PowerEdge 4220 Rack Enclosure
– 2 x Dell 2700W UPS Units 3U
– 2 x Dell PDU Units 0U (Side of Rack)
– 1 x APC Transfer Switch 1U (for devices w/ 1 PS)
– 1x APC NetBotz Rack Monitor 570 1U (environment monitoring device)
Compute:
– 3 x Dell PowerEdge R710 2U
– 2 x Cisco C220 M3 1U (For Cisco CUCM/Collab Only PUB & SUB Servers still in vCenter)
R710 System Configuration (per host):
– 2 x Zeon E5504 @ 2.0Ghz
– 288GB Memory
– PERC 6/i RAID Controller, RAID 0 each disc
– 6 x Dell Enterprise 2TB 2TB 3.5″ SATA 7.2K 6Gb/s HDD
– 1 x HGST Ultrastar SN150 3200GB SSD NVMe PCIe ( overkill)
– 1 x 4PT Broadcom BCM5709 1Gbps NIC
– 1 x 4PT Intel 82576 1Gbps NIC
– 1 x iDRAC Enterprise
– 1 x Internal SD Unit (8Gb ESXi 6.5 Dell image)
Network:
– 2 x Cisco 3750G Stack (Core Switches)
– 2 x Cisco 3750E (iSCSI Storage Switches)
– 1 x Cisco 2960S PoE Max (Access Switch)
– 1 x Cisco 3560 (Management Switch)
– 1 x Cisco 4331 ISR (Voice Gateway Router)
– 1 x Cisco 2951 ISR (Edge Router)
Security:
– 2 x Cisco ASA5515-X w/ FirePOWER
– 1 x Cisco S170 Web Security Appliance
– 1 x Cisco C170 Email Security Appliance
Storage:
Dell EqualLogic PS4000 // Dual Controller Modules 4 x 1Gbps // 16x 1TB Dell Enterprise HDD
So I tried again and have 4 drives in RAID 5 on my DELL R710 for 5.7TB. VSAN Sees this and the 1-2TB drive that I converted to SSD. I add it but I get an error on all hosts stating hosts cannot communicate with other nodes in the virtual san enabled cluster. Is this due to the RAID5 or something else? If I redid RAID5 and converted each drive to RAID0 does VSAN offer any fault tolerance?
Good article, I also have a Dell PowerEdge R710. I was wondering why my drive (RAID 5) was not working. I am new to VSAN, If I put the drives in RAID0 does the VSAN provide any fault tolerance in case a drive dies?
Great article. I was wondering why VSAN didnt see my drives (had it in RAID5). My question is since your doing RAID 0 then there is no fault tolerance?
Great read! 🙂
I just installed vSAN in my EMBEDDED lab but we'll be going to a full on new vSAN infrastructure to replace or hosted environment. Hoping with 3 x 400GB High Performance SSD's in each of our 5 hosts (17% SSD Total Capacity) will give us crazy fast IOPS for our SQL servers.
"modest investments" ?? Jad which kidney did you sell to get this lab? I see you have omitted the pricing on the lab….must be a reason for that?