I recently presented at a Micron webinar with Forrester Consulting titled “AI Matters: Getting to the Heart of Data Intelligence with Memory and Storage." My co-presenter, Wes Vaske, recently wrote a blog discussing some of his research into the role of memory and storage in enabling AI workloads. Here I want to expand on the story with a focus on memory.
One of the key findings from Forrester’s study is that when it comes to building an AI capable infrastructure, the availability and performance of memory was the number one concern of system architects and IT professionals that were surveyed. 90 percent of respondents said that they have near-term plans to re-architect their systems to get memory and compute closer together. One way to do this is to move AI workloads off the CPU and onto a graphics card or other AI accelerator that makes use of near memory. The evolution of AI hardware accelerators has created a demand for high performance near memory solutions. It is well documented that AI accelerators can achieve order of magnitude performance gains over CPUs for many AI workloads. What is not as well documented is the role that near memory plays in accomplishing these impressive leaps in performance.
Figure 1: The “Roofline Model” for analyzing performance bottlenecks of applications in a system1-2
Figure 1 shows a representation of a roofline model. This model is often used by system architects and application developers to analyze performance bottlenecks of various applications in a system. The Y axis represents compute operations per second, and the X axis represents compute operations per byte of data that is accessed from memory, often called computational intensity. The maximum compute power of a system sets the horizonal part of the roofline, and the maximum memory bandwidth sets the slanted part of the roofline.
An application can be plotted as a point on this chart to visualize performance bottlenecks. Memory bound AI applications, which will bump against the slanted part of the roofline, are unlikely to benefit from a system with more compute power unless the memory bandwidth increases as well. As each new generation of AI accelerator dramatically raises the compute roofline, more pressure placed on the memory interface to prevent bottlenecks. This is why Micron is investing in next generation of near memory technology, including GDDR and HBM. The next generation of AI accelerators will require next generation memory devices with more bandwidth than ever before.
Figure 2: The AI landscape from the perspective of a memory and storage provider
A second key point that was discussed in the webinar is that although AI workloads will continue to exist in the datacenter and cloud, future growth will likely be focused on the edge and endpoint applications. Figure 2 shows the AI landscape from the perspective of a memory and storage provider.
The heterogeneous datacenter, with many AI accelerator options including GPU, FPGA and ASIC, will continue to demand high performance and high density main memory and near memory. Development of next gen memory like DDR5, GDDR6 and HBM2E will be critical to enable the demanding AI workloads of tomorrow in the datacenter. On the other end of the AI landscape, memory choices for intelligent endpoint devices are more likely to be driven by constraints like power, form factor, cost, and reliability in harsh conditions, making LPDDR an attractive choice.
In between the datacenter and the endpoint is the “smart edge”, an exciting area of AI that is likely to experience rapid growth due to the concept of “data gravity”, which pulls workloads that would typically be done in the datacenter onto edge devices that are physically closer to where data is collected.
An example of this would be a smart internet gateway device that can intelligently decide what data to send to the data center and what to discard. The smart edge will likely lead to new and innovative AI compute systems, creating the need for new classes of memory devices and for differentiation in how memory is integrated into an AI accelerated system. In this rapidly evolving landscape, Micron understands how important it is to have a diverse portfolio of memory and storage products to enable architects to optimize their systems, wherever they are deployed.
I hope that the listeners of the webcast came away with the message that memory and storage matters when it comes to AI. At Micron, we believe that memory and storage are the heart of AI. Enabling the lifeblood of data to flow to the processors and storing the complex parameters of the deep learning models that power AI decision making.
This is why Micron is investing in next generation memory and storage, collaborating with AI systems developers, and participating in AI thought leadership. At Micron, our vision is to transform how the world uses information to enrich life.
We believe that AI has incredible potential to enrich life, and we want to be the memory and storage consultant on the systems that will enable the future applications of AI everywhere from the datacenter to the smart edge to the intelligent endpoint.
1Jouppi, Norman, et al, 2017. In-Datacenter Performance Analysis of a TPU, ISCA
2Williams, S., Waterman, A. and Patterson, D., 2009. Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM