SSDs Help Your Microsoft HCI Solution Scale

By Tony Ansley - 2018-12-04

Last month, in my blog entitled “Micron’s All-Flash SATA Reference Architecture for Microsoft® Hyper-V and Storage Spaces Direct Continues to Illustrate SSD Benefits for HCI,” I announced Micron’s latest, updated reference architecture (RA) for Microsoft Hyper-V with Storage Spaces Direct using Micron’s 5200 SATA TLC SSDs. In this latest update, we showed you how SSDs can provide excellent small-block IO and large-block throughput in a single-tiered solution. In this RA, we illustrated how to build the solution along with a performance analysis of the final solution. I encourage everyone to download the RA and read the results for yourself.

In this blog, I want to discuss VM scaling. While we were building our latest RA, we performed a series of tests using the same hardware configuration as described the RA. To better understand how the cluster performed as we increased the VM load on each node.

The Test Configuration

To test the behavior as we increased the VM count within the cluster, we ran a series of tests using VMFLEET to launch the target number of VMs, each running one instance of the diskspd load generator. This means that each VM was running a single thread of execution for the test.

We ran both small-block (4 KiB) and large-block (128 KiB) tests identical to those executed in the RA referenced above.

Each VM was configured as described in the RA:


We scaled from one to 80 VMs per node (320 VMs for the cluster). This equates to a maximum tested configuration of one VM per logical CPU core ratio. Since each VM was configured with two virtual CPUs (vCPU), our configuration was oversubscribed by two. This oversubscribed configuration allowed us to understand how the solution scaled once we ran out of logical cores.

For our analysis, we took the following data into account:

  • I/Os per second (IOPS)
  • Throughput in gigabytes per second (GB/s)
  • CPU utilization consumed as a percentage
  • Average latency in milliseconds
  • Quality of service (QoS) latency at 99.999% in milliseconds

The Results

Small-block test results show rapid scaling from one to 30 VMs per node across all read/write ratios. Once all logical CPU cores on the physical processors were allocated to VMs, scalability leveled off at just over 500K IOPS. CPU utilization as measured in the VMs levelled off at around 15% - 16% as we scaled beyond 40 VMs, which coincides with physical CPUs being fully allocated.

The charts below illustrate the IOPS performance, CPU utilization and latency for a typical 70% read IO profile with the number of VMs used along the bottom axis.


Figure 1: VM Scalability for 4KiB 70% Read Workloads - Performance and CPU Utilization


Figure 2: VM Scalability for 4KiB 70% Read Workloads - Latency

Large-block test results also show consistent scaling from one to 80 VMs per node. In both read and write workloads, performance increases with queue depth across all VM loads while CPU utilization is extremely low at no more than 3%. As discussed in the RA, our large-block workload was executed as random rather than sequential since in a highly varied virtualized environment, even if VMs internally are performing sequential I/O, at the hyper-visor level, all I/O is random as the storage system must manage the I/O of the many disparate VM’s I/O workloads.

The charts below illustrate the Throughput performance (MB/s), CPU utilization and latency (ms) for a 100% read and 100% write profiles with the number of VMs used along the bottom axis.


Figure 3: VM Scalability for 128KiB 100% Read Workloads - Performance and CPU Utilization


Figure 4: VM Scalability for 128KiB 100% Write Workloads - Performance and CPU Utilization


Figure 5: VM Scalability for 128KiB Workloads – Average Latency

Observations and Conclusions

In general, the observations below are solely based on the synthetic I/O workload used for the testing. Each application is going to be unique in how it performs I/O, so your actual results and conclusions will be different from those below. Our goal with these tests and observations is simply to provide a starting point for your own analysis of your VM and application needs.

There are several observations that can be made based on the small-block results:

  • For 70/30 mixed read/write workload, CPU utilization is very low at around 15%-16% for small-block I/O.
  • As writes increase, CPU utilization decreases.
  • Performance (IOPS) levels out quickly as the number of VMs exceeds 20 per node.
  • Latency sensitive workloads should consider lower queue depths.
  • Queue depth of 2 appears to be the best compromise for quick, high-performance transaction needs with average latency of .97ms and QoS latency of 628ms while delivering over 500K IOPS.

There are several observations that can be made based on the large-block results:

  • Large block workloads seem to benefit from higher queue depths.
  • Depending on the VM application requirements, latency may or may not be a deciding factor.
  • Latency is relatively low at high VM levels for heavy read workloads at < 30ms.
  • There is a distinct increase in latency at VM levels above 60 VMs per node for writes and for higher-queue depth read workloads.

Other general observations:

  • As would be expected, CPU utilization is lower as block size is increased since more data can be written per I/O.
  • Several factors to consider:
    • This server configuration used a SAS RAID controller (in HBA passthrough mode) on a x16 PCIe slot. This limits throughput to 4GB per second.
    • Testing was done with a replication factor of three. This means that every write operation at an individual VM results in SSD writes on three different cluster nodes.
    • Networking overhead increased as writes increased due to replication factor requiring data to be sent to at least two additional nodes.

Final Thoughts

Micron’s Windows Hyper-V with Storage Spaces Direct HCI RA highlights the value of SSDs for virtualized environments. With high-performance from lower-cost SATA SSDs, this solution provides over 1.5 million IOPS (6GiB per second) of small-block I/O with low CPU utilization. Additionally, this RA shows excellent VM scalability with over 500,000 70/30 IOPS (2GB/s) over a broad range of VM loads.

For the full details, download one of Micron’s RAs to see how we built our solution and learn how you can take advantage of Micron’s high-performance SATA and NVMe SSDs within your Microsoft HCI solutions.

Learn More: 

Stay up to date by following @MicronStorage on Twitter!

Amit Gattani

Tony Ansley

Tony is a 34-year technology leader in server architectures and storage technologies and their application in meeting customer’s business and technology requirements. He enjoys fast cars, travel, and spending time with family — not necessarily in that order.