Tuesday, November 12, 2013

vSphere ESXi - Low CPU usage and High CPU ready


Recently, I’ve been receiving more requests for VMs with multiple vCPUs. 

A quick look in vCenter shows that the CPU utilization of the ESXi hosts in the cluster as low.  However, you need to dig a bit deeper and take a good look at your CPU ready performance stats to get the entire picture. 

**Low ESXi CPU utilization DOES NOT necessarily mean your CPU ready times are acceptable**

CPU ready is the time that a VM is “ready” to run, but could not be scheduled because of lack of physical CPU resources.  This typically occurs when the physical CPUs are over subscribed on an ESXi host. 

The Low CPU usage and High CPU ready condition can be caused by one or more “Monster VMs” configured with more vCPUs than the ESXi host can handle.  This in turn creates a bottle neck for the remaining VMs on the host.

vCenter gives you the Ready information in Milliseconds.   

Keep in mind that the %RDY number is the sum of all vCPU %RDY(s) for a given VM.  So, 10%RDY for a VM configured with 1 vCPU would mean there is serious contention for CPU resources.  However, if the VM was configured with 4 vCPUs it would mean that 2.5 percent of the time is used waiting to be co-scheduled. In the case of a 4 vCPU virtual machine, the max possible %RDY is 400%.

Conversions (approximate):
1% (%RDY) = 200ms
5% (%RDY) = 1000ms
10% (%RDY) = 2000ms

%RDY rough guidelines:
<2.5%                    OK
5%                          Contention
10%                        Serious Contention

VMWares performance trouble shooting guide has the CPU ready threshold at <1000ms per vCPU.

vCPUs Allocated
CPU Ready Threshold for VM
1
1000ms
2
2000ms
4
4000ms
8
8000ms
 

**Update June 2021** - Recently, I've updated my method to determine if my CPU Ready figures are within an acceptable range.  Below is a fantastic calculator to easily determine if your VM is suffering from contention:

1. Obtain the summation value from vCenter.  No need to ssh into the box. I always used the max value.

2. Use VMCalc.com to determine if you are within the acceptable range. 

3. In this example, I used the data for the “Last Day”, entered the Max summation value and number of vCPUs. 


To resolve this issue:
1. Confirm that all your VMs are "Right Sized".  Use vCenter/vCenter Operations/Turbonomics to see if any VMs are over-provisioned.  Reduce vCPUs as needed.
2. Enable DRS on the cluster to have VMware migrate the VMs to the appropriate host.
3. For Dell R810s confirm that power management is set for "Maximum Performance".  For R820s, set system Profile to "Performance".  (this killed us on an SQL cluster).
4.  If your cluster is overcommited, you may need to upgrade or add an additional host.

No comments:

Post a Comment