There is a lot of reluctance and resistance to virtualizing MS SQL server workloads on either the Microsoft Hyper-V or the VMWare ESX platforms. Although I agree that not all SQL workloads are suitable candidates for virtualization, in some instances it may be the preferable option.
VMware recently updated their guidelines on virtualizing tier 1 applications with MS SQL being one of them. VMware suggests that the key points to consider before virtualizing a Tier 1 SQL Server are:
1. Virtual CPU
For most lower tiered database workloads a reasonable amount of over-commitment, i.e. ratio of pCPU (or physical CPU cores) to vCPU (virtual CPU) actually increases aggregate throughput. This combined with vMotion and DRS can allow for better resource load balancing.
However, in the case of Tier 1 SQL workloads as far as possible avoid pCPU over-commitment. A few things to look for at the are:
%RDY – %RDY or CPU Ready percentage is the metric that describes the amount of time a virtual machine waits in the queue in a ready-to-run state before it can be scheduled on a CPU. A deep dive on CPU Ready percentage (%RDY) is a discussion for another day, however the lower %RDY metric the better and the generally acceptable threshold is around 10% but aim for 5% or less.
%MLMTD – available only through esxtop, is the percentage of time the VM was ready to run but wasn’t scheduled because it would violate the CPU Limit set either on the guest or the resource pool the guest belongs to. Limits can artificially raise the %RDY time and lead to performance degradation Look for and eliminate limits set on guests and resource pools for your Tier 1 database.
%CSTP – it is a measure of the percentage of time a vCPU in an SMP virtual machine is “stopped” from executing, so that another vCPU in the same virtual machine could be run to “catch-up” and make sure the skew between the two virtual processors doesn’t grow too large. In most cases a high %CSTP value is a good indicator that the number of vCPUs on a VM has been over-provisioned. In some cases a high %CSTP value can also be seen during VM snapshot activities. Take a look at the snapshot schedule and try and avoid taking snapshots during busy periods.
NUMA Node sizes: For smaller guests, keep the number of vCPUs less than or equal to the number of cores in a physical NUMA node. For example if each NUMA node consists of 6 cores size your VMs as a multiple of your physical server’s NUMA node size, i.e. 2, 3 or 6 vCPUs.
For larger VMs (with ESXi 5), align vCPUs to physical NUMA boundaries and enable vNUMA on ESX to allow SQL Server NUMA optimization.
2. Virtual Memory
- Avoid memory over commit until vSphere vCentre reports that the steady state utilization of the memory is below the amount of physical memory available on the physical server.
For mission critical workloads it is advisable to use memory reservations to avoid memory ballooning or memory swapping. Initially set the memory reservation equal to allocated memory. This can then be tweaked lower based on active usage patterns, however if possible leave the reservation as is.
- If memory over-commitment is being enabled, closely monitor memory use and determine whether the virtual machines are ballooning and/or swapping. Through esxtop look at:
MCTLSZ: which displays the amount of guest physical memory reclaimed by the balloon driver
SWCUR: which displays the current Swap Usage
SWAP MB read/sec and write/sec
- If there is any memory ballooning ensure there are no limits set on memory utilization or lower the memory over-commit by reducing the memory allocated to the servers
Keep an eye on the Active Memory counter, but use this with caution. Always confirm actual usage by taking a look at the memory counter in perfmon within the guest
Do not set memory limits on virtual guests or resource pools
- Follow VMware best practice and use separate redundant NICs to drive management traffic (vMotion, Fault tolerance, vmKernel), virtual machine traffic and storage network traffic
- Allocate at least 2 NICs per vSwitch to leverage NIC teaming capabilities
- As far as possible try and use the VMXNET3 para-virtualized virtual network adapters in the guest VMs
- If deploying MS SQL Server on Windows Server 2008 R2 built using VMware templates, install MS KB2344941 to avoid seeing the ghost NIC (issue only arises if VMXNET3 adapters are used)
- As far as possible try and use the para-virtualized SCSI driver for I/O intensive SQL Server volumes. There is a known issue with the use of PVSCSI driver on Windows Server 2008 R2.
- Although this has been fixed in vSphere 5.0 Update 1 & vSphere 5.1, use this patch from VMware (KB2004578) to resolve this issue for earlier vSphere deployments.
- Try and dedicated VMFS volumes for Tier-1 SQL Server files
- Ensure that disk partitions are aligned at the VMFS and guest OS level
- Deploy multiple virtual SCSI adapters to distribute workload and provide redundancy
- Provision all tier 1 SQL server VMDK as eagerzeroedthick. This will zero out all blocks used by the VMDK on first access. Although this will cause a time penalty during server provisioning it will improve performance as it will not incur the additional latency of zeroing on first write.
- Separate SQL Binary, data, log and tempdb files on to individual VMDKs
- Use multiple smaller LUNs (preferably on separate aggregates) for better manageability and performance