2 Million IOPS in 4U: Micron 9200 MAX + Red Hat Ceph Storage 3.0 Reference Architecture Block Performance

By Ryan Meredith - 2018-04-16

Micron Ceph

Hi everybody!

OpenStack Summit 2018 is quickly closing in and Micron has a new reference architecture to support OpenStack™ cloud storage and content distribution. In this blog post, I’ll share the performance of our new Purley-based Ceph reference architecture (RA) featuring the fastest Micron NVMe drive available, the 9200 MAX SSD (6.4TB).

Our new reference architecture leverages Red Hat Ceph Storage 3.0, based on Ceph Luminous (12.2.1). Testing in the RA is limited to Filestore performance since that is the currently supported storage engine for RHCS 3.0.

Spoiler alert: Purley + RHCS 3.0 + the Micron 9200MAX is stupid fast.

Ceph 1

This solution is optimized for block performance. Random small block testing using the Rados Block Driver in Linux saturates platinum-level 8168 Intel Purley processors in a 2-socket storage node.

With 10 drives per storage node, this architecture has a usable storage capacity of 232TB that can be scaled out by adding additional 1U storage nodes.

Reference Design – Hardware

SuperMicro Switches 

Test Results and Analysis

Ceph Test Methodology

Ceph is configured using FileStore with 2 OSDs per Micron 9200MAX NVMe SSD. A 20GB journal was used for each OSD. With 10 drives per storage node and 2 OSDs per drive, Ceph has 80 total OSDs with 232TB of usable capacity. The Ceph pool tested was created with 8192 placement groups and 2x replication.

4KB random block performance was measured using FIO against the Rados Block Driver. 100 RBD images were created at 75GB each, resulting in a dataset of 7.5TB (15TB with 2x replication).

RBD FIO 4KB Random Read Performance

4KB random reads scale from 289k IOPs up to 2 million IOPs. Ceph reaches maximum CPU utilization at a queue depth of 16, increasing to queue depth 32 doubles average latency and only marginally increases IOPs.

Ceph 2

RBD FIO 4KB Random Write Performance

4KB random writes reach maximum IOPS at 100 clients with 375k IOPs. Average latency ramps up linearly with the number of clients, reaching a maximum of 8.5ms at 100 clients.

Ceph 3

4KB write performance hits an optimal mix of IOPs and latency at 60 FIO clients, 363k IOPs, 5.3 ms average latency. At this point, the average CPU utilization on the Ceph storage nodes is over 90%, limiting performance.

RBD FIO 4KB Random 70% Read / 30% Write Performance

70% read / 30% write testing scales from 211k IOPs at a queue depth of 1 to 837k IOPs at queue depth 32. Read and write average latencies are graphed separately, with maximum average read latency at 2.71ms and max average write latency at 6.41ms.

Ceph 4

Would You Like to Know More?

RHCS 3.0 + the Micron 9200 MAX NVMe SSD on the Intel Purley platform is very fast. You can learn more about our testing and results in the the reference architecture available on I will present our RA and other Ceph tuning and performance topics during my session at OpenStack Summit 2018. More on that to come. Stay tuned!

Have additional questions about our testing or methodology? Leave a comment below or you can email us

Ryan Meredith

Ryan Meredith

Ryan Meredith is a senior manager of Storage Solutions Engineering at Micron. He's worked in enterprise storage since 2007 for U.S. Bank, IBM and Gemalto. He currently leads a team focused on architecting and performance testing enterprise solutions using Micron's DRAM and flash technologies. He likes dogs, games, travel and scuba diving.

Ryan has a Master of Science in management information systems from the University of South Florida.