Caching In - the magic behind vSphere's CPU scheduler

One of the most important objectives of virtualizing a new or existing infrastructure is efficiency…both operational and financial. Virtualization wouldn’t be where it is today without a means of getting the most bang for the buck and clearly demonstrating the value-add of system consolidation — whether within your labs, server rooms, datacenters, or across the entire enterprise. To justify virtualization projects of any significance you have to hit your leadership where it hurts (tickles)…the corporate wallet. What better way to do that than consistently reducing acquisition and operational costs to your project?

Of course I’m not suggesting you’ll be swimming in cash (or a nice bonus) the moment you deploy your first hypervisor, although this is a first step in the right direction. Witnessing a rack of 10 x 2U servers reduced to a single host (i’m being conservative), while centralizing management and often increasing performance, is nothing short of wonderful. How about 100 of these same servers into a single rack? 100 loaded racks into 10? Enough said. VMware’s value proposition is very clear in this arena. In keeping with my promise of no sales pitches, i’ll spare you the ROI/TCO chatter. Just consider this – the cost of maintaining 100 legacy servers is drastically greater than acquiring 10 brand new uber-hosts sporting the latest chipsets, energy efficiency, memory/cpu capacity, and all the necessary vSphere licensing. Ask me to explain the math if you’re interested.

If I was to stop right here, the title of this blog would be “Cashing In”, but of course that isn’t the only objective of this post…

vSphere added some pretty incredible things under the cover to help achieve the level of VM-to-host ratios we enjoy while delivering industry-leading hypervisor bandwidth. At the helm is ESX’s CPU scheduler…technology that drives efficiency. The CPU scheduler is where a lot of the magic happens. Its primary role is to schedule access to physical CPU resources, orchestrating virtualized guest access to potentially dozens of cores across multiple sockets, all of which can be based on an SLA or user requirement. This is no easy task – providing CPU access to a guest OS with one or more vCPU’s is a bit more complicated than scheduling a native process within an OS. The scheduler needs to ensure each guest perceives it has exclusive rights to the resources it thinks it owns for uninterrupted access by the OS and all its subsequent processes. To achieve this, ESX uses a proportional-share based algorithm to dynamically allocate available physical CPU (pCPU) resources to the guest VMs. The scheduler also utilizes “gang scheduling”, where multiple processes are executed simultaneously for increased performance, among many others.

A significant change between ESX 4 and previous versions is the ability for a guest’s vCPU’s to span multiple physical sockets. Previously, a VM with multiple vCPU’s was only able to be scheduled within a cell — a VM with 4 vCPU’s (this was the max in ESX 3.x) on a host with quad-core pCPU’s would be scheduled only on a single physical socket (i.e. the cell). The implementation of cell scheduling made a lot of sense at the time, but now that ESX support 8 vCPUs (and growing), this became a constraint. The ESX (4.x) scheduler is no longer limited to scheduling a VM with multiple vCPUs within a single socket since cell model is no longer a factor. ESX in now capable of scheduling vCPU’s belonging to a single VM to physical cores across multiple physical sockets – a decision based on several factors including user preference (SLA’s), reservations, VM priorities, fairness, load balancing, and the proportional-share algorithm. A significant performance boost is attributed to the ability to utilize last-level cache (LLC) and the high speed memory bus between physical sockets for core scheduling in ways never before possible. One example of how cache is utilized is through a feature called inter-VM cache affinity, where VM’s within a physical host can share cache based on how frequently they communicate with each other. Another feature calledvSMP consolidate, enables the scheduler to determine whether or not multiple vCPUs within a VM can benefit by addressing the same LLC.

There are many of these little treasures under the covers of vSphere, all with the objective of efficiency, performance, setting precedence, and staying ahead of the game. If you really want to geek out on this stuff, take a look at the vSphere CPU Scheduler white paper —http://www.vmware.com/resources/techresources/10059. Now you have an idea of how VMware achieves those industry-leading virtualization ratios and performance metrics. But if you’re still playing the “which platform should I choose?” game, take the time to setup small clusters of each to determine for yourself why vSphere stands alone…in small environments or at a massive scale.

++++

@virtualjad

Caching In – the magic behind vSphere’s CPU scheduler