DESIGN TOOLS
company

Ceph BlueStore: To cache or not to cache, that Is the question

John Mazzie | March 2019

To Cache or not to Cache, that is the question.

Well, do you? Cache for your Ceph® cluster? The answer is, that it depends.

You can use high-end enterprise NVMe™ drives, such as the Micron® 9200 MAX, and not have to worry about getting the most performance from your Ceph cluster. But what if you would like to gain more performance in a system that is made up mostly of SATA drives. If this is the case, there are benefits to adding a couple of faster drives to your Ceph OSD servers for storing your BlueStore database and write-ahead log.

Micron developed and tested the popular Accelerated Ceph Storage Solution, which leverages servers with Red Hat Ceph Storage running on Red Hat Linux. I will go through a few workload scenarios and show you where caching can help you, based on actual results from our solution testing lab.

System Configuration

Testing was done using a four OSD node Ceph cluster with the following configuration:

 Processor  Single Socket AMD 7551P
 Memory  256GB DDR4 @ 2666Hz (8x32GB)
 Networking  100G
 SATA Drives  Micron 5210 ION 3.84TB (x12)
 NVMe Drives (Cache devices)  Micron 9200 Max 1.6TB (x2)
 OS  Red Hat® Enterprise Linux 7.6
 Application  Red Hat Ceph Storage 3.2
 OSDs per drive SATA drive  2
 Dataset  50 RBDs @ 150GB each with 2x replication

Table 1: Ceph OSD Server Configuration

4KiB Random Block Testing

For 4KiB random writes, using FIO (Flexible I/O), you can see that utilizing caching drives greatly increases your performance while keeping your tail latency low, even at high load. For 40 instances of FIO, the performance is 71% higher (190K vs 111K) and tail latency is 72% lower (119ms vs 665ms).

 

Micron graph comparing performance with and without cache during 4MiB object write tests using the rados bench command

Figure 1: 4KiB Random Write Performance and Tail Latency

There is some performance gain during 4KiB Random Read testing, but it is much less convincing. This is to be expected as, during a read test, the write-ahead log will not be utilized and the BlueStore database won’t change much if at all.

Chart depicting minor performance improvements in 4KiB random reads with Micron caching, reflecting minimal database impact

Figure 2: 4KiB Random Read Performance and Tail Latency

A mixed workload (70% Read/30% Write) also shows the benefits of having caching devices in your system. Performance gains range from 30% at 64 queue depth to 162% at 6 Queue depth.

Micron graph showing higher IOPS and lower latency with caching in a 70/30 read/write mix

Figure 3: 4KiB Random 70% Read/30% Write Performance and Tail Latency

4MiB Object Testing

When running the rados bench command with 4MiB objects, there is some performance gain with caching devices, but it’s not as dramatic as the small block workloads. Since the write-ahead log is small and the objects are large, there is much less impact on performance by adding caching devices. Throughput is 9% higher (4.94 GiB/s vs 4.53 GiB/s) with caching vs none, while average latency is 7% lower (126ms vs 138ms), when running 10 instances of rados bench.

Figure 4: 4MiB Object Write Performance

With reads, we again see that there is negligible performance gain across the board.

Chart showing similar Micron 4MiB read performance with and without cache over different thread levels

Figure 5: 4MiB Object Read Performance

Conclusion

As you can see, if your workload is almost all reads, you won’t gain much if anything from adding caching devices to your Ceph cluster for BlueStore database and write-ahead log storage. But with writes, it is a completely different story. Although, for large objects, there is some gain, the real showstopper for caching devices is with small block writes and mixed workloads. For a small investment of adding a couple of Micron performance 9200 NVMe drives to your system, you can get the most out of your Ceph cluster.

What sorts of results are you getting with your open source storage? Learn more at Micron Accelerated Ceph Storage.

Stay up to date by following us on Twitter @MicronStorage and connect with us on LinkedIn.

MTS, Systems Performance Engineer

John Mazzie

John is a Member of the Technical Staff in the Data Center Workload Engineering group in Austin, TX. He graduated in 2008 from West Virginia University with his MSEE with an emphasis in wireless communications. John has worked for Dell on their storage MD3 Series of storage arrays on both the development and sustaining side. John joined Micron in 2016 where he has worked on Cassandra, MongoDB, and Ceph, and other advanced storage workloads.