Micron has been making news with exciting announcements across memory and storage portfolios that accelerate AI growth. Our HBM3E 8-high and 12-high solutions offer industry-leading performance at 30% lower power consumption than the competition1. Micron HBM3E 8-high 24GB product will ship in the NVIDIA H200 Tensor Core GPU. On a recent episode from Six Five Media, hosts Daniel Newman (CEO of Futurum Group) and Patrick Moorhead (CEO of Moor Insights & Strategy) sat down with Girish Cherussery, senior director of Product Management at Micron. Together, they explored the fascinating world of high-bandwidth memory (HBM), looking at its applications in today’s technology landscape. This article recaps their conversation — everything from the intricacies of HBM to how Micron is meeting market demands and what’s currently unfolding in the memory ecosystem. Girish also provided valuable insights for the audience keen to stay informed about market trends in AI memory and storage technology.
What is high-bandwidth memory, and what is it used for?
HBM, an industry-standard in-package memory, is a game-changer. It offers the highest bandwidth for a given capacity in the smallest footprint, all while being energy efficient. As Girish pointed out in the Six Five podcast, AI applications are increasingly deploying complex large language models (LLMs), and the challenge of training these models arises due to limited GPU-attached memory capacity and bandwidth. LLM sizes have grown exponentially, far outpacing memory capacity growth. This trend underscores the growing need for memory capacity.
Consider GPT-3, which had about 175 billion parameters. This translates into approximately 800 gigabytes of memory and a greater need for higher bandwidth to prevent performance bottlenecks. The latest GPT-4 model has significantly more parameters (estimated in trillions). Traditional methods of adding more memory components result in prohibitively expensive systems.
HBM offers an efficient solution. Micron HBM3E has eight or twelve stacks of industry-leading 1β (1-beta) technology-based 24Gb die in an 11mm x 11mm package, delivering a higher capacity of 24GB or 36GB in a smaller footprint. Micron’s leading-edge design and process innovations enable HBM3E’s memory bandwidth of over 1.2 TB/s with a pin speed greater than 9.2 Gb/s. HBM3E has 16 independent, high-frequency data channels, akin to highway lanes, as Girish said, to move the data back and forth faster and deliver the needed performance.
Micron HBM3E's higher capacity and bandwidth result in faster time to train LLMs, resulting in significant OpEx savings for customers. Higher capacity HBM3E supports larger language models and helps avoid CPU offload and GPU-GPU communication delays.
HBM3E is highly energy-efficient because the data path between the host and memory is shorter. The DRAM communicates with the host through silicon vias or TSVs, which Girish explains as being similar to a toothpick through a burger. It takes power and data from the bottom die and moves it to the top memory layer. Micron’s HBM3E consumes 30% less power than the competition because of the advanced CMOS technology innovation on the 1β process node and advanced packaging innovations with up to 2 times TSVs coupled with 25% shrink in package interconnects. This 30% lower power consumption at 8Gbps per memory instance delivers more than $123 million in OpEx savings to a customer over five years for a 500,000 GPU install base.1,2
Thus, as Daniel Newman put it, Micron’s HBM3E is the biggest, fastest and coolest memory with a positive impact on the sustainability needs of the data centers.
How does Micron HBM3E respond to the demands of generative AI and high-performance computing?
At Micron, we believe in solving problems that address fundamental human challenges, enriching the lives of all.
Supercomputer simulations today drive huge memory and bandwidth needs. As Girish explains, pharmaceutical companies were urgently trying to identify new drugs and compounds for COVID-19 during the pandemic. HBM is part of the high-performance computing systems that address massive computing needs to solve the critical challenges of our times. Thus, HBM fundamentally changes the perception of memory technology as a vital enabler of massive computing systems by delivering needed performance and capacity in a compact form factor while consuming significantly less power.
Today's data center industry faces power and space challenges as computing continues to scale in the age of AI. AI and high-performance compute (HPC) workloads drive higher memory utilization and capacities. The energy required to cool the data center is also a challenge. For a system with HBM, the system cooling is at the top of the DRAM stack, while the heat generated due to the power of the base die and DRAM layers is at the bottom of the stack. Hence, this requires us to think about power and heat dissipation in the early stages of design. Micron’s advanced packaging innovations provide structural solutions to improve the thermal impedance, helping improve the thermals of the cube. The overall thermals will be substantially lower than those of competitors when combined with the significantly lower power consumption. Thus, Micron’s HBM3E’s better power and thermal efficiency help address the significant challenges of data centers.
What are the emerging trends in AI memory solutions?
Generative AI proliferates across applications from the cloud to the edge, fueling significant system architectural innovation in the heterogeneous computing environment. AI is accelerating the trends driving applications at the edge, such as Industry 4.0, autonomous cars, AI PCs, and AI-enabled smartphones. As Girish shared, these secular trends drive significant innovation in the memory subsystem across technologies to deliver more capacity, bandwidth, reliability and lower power consumption.
Micron’s 1β-based LPDDR5X portfolio delivers these systems with best-in-class performance/watts for AI inference at the edge. Micron is the first to market with the LPDDR5X-based innovative form factor LPCAMM2 to transform the user experience for PC users and enable the AI PC revolution.
Data center architectures are also evolving. Micron's monolithic-die-based high-capacity RDIMMs fuel advances in AI, in-memory databases and general-purpose compute workloads for data center servers worldwide. Our first-to-market, high-capacity 128GB RDIMMs offer the performance, capacity, and lower latency for efficiently processing applications that need higher memory capacities, including the AI workloads that are off-loaded from GPU to CPU for processing.
We also see increasing adoption of LPDDR memory (low-power DRAM) in the data center for AI acceleration and inference applications because of its performance/watts advantage. Micron’s graphic memory — GDDR6X — which runs at an impressive 24 gigabits per second pin speed, is also adopted into data centers for inference applications.
Another emerging memory solution Micron is pioneering to deliver memory and bandwidth expansion for data center applications is CXL™-attached memory. Micron’s CXL memory module CZ120 delivers memory expansion for AI, in-memory databases, HPC and general-purpose compute workloads.
AI is ushering in a new era for humanity, touching every aspect of our lives. As society harnesses AI's potential, it will continue to drive rapid innovation across industries in the digital economy. Data is the heart of the digital economy and resides in memory and storage solutions. With Micron’s technological capabilities, leadership portfolio of innovative memory and storage solutions with a strong roadmap, and commitment to transforming how the world uses information to enrich all lives, Micron is well-positioned to accelerate this AI revolution.
1 Based on customer testing and feedback for Micron and competition HBM3E
2 Source: Internal Micron Models