In the spring of 2015, the Savills IT organization had a major issue that needed immediate attention:
VMware virtual machines were experiencing performance issues: Higher-than-normal disk latency that was impacting overall performance and extending backup windows into production hours.
Two of the classic virtualization performance problems are HBA-LUN queue contention and disk latency and Savills was being affected by both. Queue contention is caused by excess SCSI commands clogging the queue while disk latency problems occur when the controller issues an overwhelming number of physical I/O to the SAN and the disks can no longer respond in acceptable times. The two problems go hand in hand since the number of SCSI commands directly affects the disk I/O count. VMware contends latency of 15ms should be monitored and over 30ms it is a cause for concern. VMware’s Storage I/O Control (SIOC) actually throttles the LUN queue when latency hits 30ms thereby trading performance for better latency.
Savills noted their disk latency was affecting SQL, SharePoint and Exchange performance. Average latency was 15ms but frequent peaks of more than 100ms were affecting productivity and causing long overruns on backups.
The IT staff realized that the situation was likely to worsen unless the Windows guest systems were properly maintained.