Post 3 of 6: Hardware – choosing your infrastructure

Okay, let’s talk hardware. Choosing the right hardware and technologies are essential for the success of your cloud. This post covers the fundamentals of hardware selection for your cloud – pointers to get your brain cells focused on specs vs. brands, compute resources vs. servers…you get the idea. With all this hardware you will eventually have the means of building your reference architecture – but that’s the next post in this series. For the sake of delivering workloads to anyone from anywhere based on expected levels of service, your focus should be on raw performance, efficiency (to include total operating cost, overhead, and management), integration, and scalability. One point to remember – the most expensive components won’t necessarily ensure you meet SLA’s or make for a faster/better cloud. Cost is often just as important as performance, if not more important, and should be considered just as any other critical criteria. Keep this in mind as you disagree with the forthcoming pointers J
When I discuss hardware with my customers, I stick to performance levels and expectations, varying levels of integration, and the technologies that will tie the infrastructure all together. Choosing the right hardware doesn’t mean use Dell for this and EMC for that (although it might)…it means selecting the hardware components that will ensure you meet SLA’s and fulfill your business criteria…period. It is understood that many enterprises have strategic relationships and have standardized on vendors for various components. There is definitely value in that – just as long as the strategic vendors can deliver on their end of the deal. From a virtualization perspective, your workloads expect resources. vSphere delivers resources – agnostically and heterogeneously – from the underlying hardware. Give me network, compute, and storage, and we’ll deliver the appropriate resources to meet SLA’s. With that said, before you log in to to order the components needed to build the greatest whitebox hypervisor host for use in your private cloud, please take a moment to review the Hardware Compatibility List. Moving on…
Cloud aside, one core objective of virtualization is driving the VM-to-Host ratios up as much as possible to reach ~70% utilization while reducing infrastructure size/sprawl. The more resources a single host provides, the higher these ratios can get. For new installations, choose the latest multi-core CPU’s and highest memory aggregation possible. I’ll often opt for larger servers with 4 or more multi-core CPU’s vs. a larger number of dual-socket hosts for higher VM-to-host ratios. Six cores?  Even better. Don’t worry about putting many eggs in one hypervisor basket – vSphere will take care of those VMs with HA, DRS, and vMotion. It is perfectly acceptable if a larger qty of smaller servers makes you more confortable. The sweet spot these days seems to be dual-CPU hosts with 8-12 cores. Whether dual-socket or quad, aim for the maximum cores per CPU. This will get your ratios up and help you stretch your vSphere licenses (a single vSphere ENT Plus license covers up to 12 cores per CPU). Memory resources tend to diminish much quicker than CPU. To help mitigate that, I use the 4-to-1 rule of thumb. That’s 4GB RAM per 1 CPU core. With this rule, a host with 4 x 6-core CPUs would be configured with at least 96GB RAM for average workloads. If you feel this is too conservative opt for larger memory modules (16 or 8GB vs. 4GB chips) to free up slots for future expansion (although this can be costly). I wouldn’t be telling the whole story if I claimed many cores + a ton of memory = performance…there are many subcomponents and technologies that will help drive performance. Take a closer look at bus speeds, CPU cache, memory type, etc. If you have an opportunity to choose components at this level, take advantage of it and aim high…every component counts and can help drive ratios and performance. If you’re a mega-geek and want to learn how vSphere’s scheduler can take advantage of this capability, read my post titled “Caching In” and associated whitepaper.
Let’s switch gears to bottleneck…err, I mean Storage. If storage is a religion for you as it is for so many people I encounter in my career, go right ahead and skip this section and [cautiously] do what you think is right. I run into storage-related performance issues more than anything else. Let me be blunt – you can’t deliver IOPS your spindles don’t have. Nor will 500 spindles do you any good over a storage fabric that can’t handle the bandwidth. And overprovisioning is NOT the right answer. It’s no secret that virtualization drives the need for shared storage. And when you consolidate your workloads your storage architecture will define performance or or uncover pain points. Please plan accordingly. You should have an idea of the IOPS requirements needed by each individual workload, host, and cluster. Take some time to set average and peek baselines – this will help you size your storage capacity and performance accurately. With that said, realize I’m not suggesting you run out and buy racks and racks of 15K SAS or FC disk. You may benefit greatly (financially) by going with a greater number of 10k SAS or even :::gasp::: 7200RPM SATA disk, especially if your focus is capacity vs. throughput. Our goal is to deliver the appropriate level of performance and availability. You can meet any IOPS requirement with any one of these spindle speeds if done properly. Other considerations include cache size, RAID technology used, and number of spindles per RAID group. Do some planning to determine whether you need performance vs. capacity (or both) and a little bit of math for sizing. Next you’ll want to ensure the storage fabric bandwidth – whether IP (NFS, iSCSI, FCoE) or FC – will suffice based on your calculated workload requirements. For all other considerations (LUN sizing, number of VMs per Datastore, etc.) take a look at VMware’s iSCSI or Fibre Channel SAN Configuration Guides to keep you on the right track. Consider technologies that optimize access and disk IO – you may be able to reduce back-end IOPS (and spindles) by placing some cache up front. This is especially true when you’re working with repetitive workloads. Such is the case with virtual desktop infrastructures. Most importantly, work with your vendors and express your “needs” and “wants” based on what you’ve determined to be feasible and, if the pocketbook permits, take advantage of some of the unique technologies each have to offer.
I’ll leave you with this…the days of customers specifying physical hardware requirements are over. In terms of cloud, we are providing a level of abstraction that will enable you to deliver SLAs based on customer requirements and not component specs. Your focus should be provisioning enough physical resources to meet performance and availability levels and not so much on the specific hardware details. If a customer is asking for a peek of 2000 IOPS for a given workload, it is up to you to deliver that level of performance regardless if it’s coming from 20 x SATA or 10 x 15k SAS disk. This can be a shift in how you operate today, but as we aim for efficiency and IT as a service, providing this level of abstraction will be key to your success.
In this Series:
3 – Hardware!! – choosing your infrastructure 
6 – Get a Grip! – managing your cloud