Invalid input. Special characters are not supported.
We’re proud to unveil a major milestone in AI infrastructure performance: 230 million IOPS achieved using the NVIDIA SCADA programming model, Micron 9650 PCIe® Gen6 SSDs, Broadcom PEX90000 PCIe Gen6 switches and the H3 Platform Falcon 6048 PCIe Gen6 server.
Scaled Accelerated Data Access (SCADA) is a secure programming model and technology stack introduced in GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture. It is a major storage ecosystem initiative with NVIDIA, Micron and others working together to define and implement a new class of infrastructure to access massive datasets beyond local memory limits, preventing out-of-memory errors by using NVMe-based load/store operations, while relocating storage control to trusted DPUs to maintain high performance and protect shared data from compromised compute nodes.
This result showcases the power of GPU-driven storage orchestration combined with next-generation interconnects and the world’s fastest SSDs.
You can see this demo live in the Micron booth (#3516) at SC25 — details are provided at the end of this blog.
Micron 9650: The fastest SSD in the world
The Micron 9650 SSD isn’t just about raw performance — it’s about enabling next-generation AI and HPC workloads with a balanced blend of speed, energy efficiency and interoperability. As the world’s first PCIe Gen6 SSD, announced at FMS 2025 (Micron Unveils Portfolio of Industry-First SSDs to Power the AI Revolution), it delivers record-breaking throughput and IOPS while also supporting robust ecosystem integration. For the last two years, Micron has collaborated closely with partners across the PCIe Gen6 landscape to conduct extensive interoperability testing, paving the way for broad adoption this year. With its PCIe Gen6 architecture and optimization for small-block operations, the Micron 9650 is purpose-built for GPU-driven environments like NVIDIA SCADA.
NVIDIA SCADA supercharges throughput to scale AI workloads
SCADA (Scaled Accelerated Data Access) represents NVIDIA’s vision for GPU-initiated storage operations, bypassing traditional CPU bottlenecks by establishing a direct connection between GPUs and storage to accelerate data transfer between them. SCADA resulted from many years of NVIDIA research and engineering to enable GPUs to directly orchestrate NVMe transactions, delivering unprecedented throughput and IOPS for small-block workloads that are critical to scaling AI workloads like graph neural networks (as used in medication discovery, social networks, knowledge graphs). For more information about SCADA, please refer to this NVIDIA presentation from the FMS 2025 conference: Advancing Memory and Storage Architectures for Next-Gen AI Workloads.
Broadcom and H3: Cutting-edge server platforms at work
Powering this orchestration is the H3 Falcon 6048 PCIe Gen6 server, integrated with Broadcom’s PEX90000 PCIe Gen6 switch series. These PCIe Gen6 switches provide ultra-low latency, high bandwidth and exceptional port density, enabling robust scalability and seamless interconnectivity between GPUs and NVMe devices.
The PCIe Gen6 switches are deployed within the H3 Platform Falcon 6048 server, a system that unifies accelerators and storage into a single PCIe Gen6-optimized fabric. It supports 44 E1.S Micron 9650 SSDs, each connected through PCIe Gen6 x4. H3’s advanced telemetry and diagnostics simplify large-scale AI fabric management, while extensive interoperability testing with CPUs, GPUs, SSDs (notably the Micron 9650), NICs and retimers ensures reliable, worry-free deployment.
The demo: 230 million IOPS in action
Our SC25 demo is not just a proof point — it’s a milestone. Using a Falcon 6048 server from H3 Platform configured with:
- 44x Micron 9650 PCIe Gen6 SSDs (E1.S, 7.68TB)
- 3x NVIDIA H100 PCIe Gen5 GPUs with NVL 96GB HBM3
- 1x Intel PCIe Gen5 CPU
- 3x Broadcom PEX90000 PCIe Gen6 series switches (144 lanes each)
We achieved 230 million 512B random read IOPS with the SOL benchmark SCADA workload. This benchmark measures the random IOPS that can be achieved from GPU threads when accessing data from a group of SSDs. This level of performance demonstrates linear scaling from 1 to 44 SSDs, validating the shared value between GPU-driven IO and PCIe Gen6 infrastructure.
We tuned the sol benchmark to use three instances with 44 devices spread across them, 256 iterations (I/O per queue pair * 512) and eight queue pairs to get maximum performance.
Why it matters for AI and HPC
As AI models grow in complexity and inference-time use of data, storage can become a bottleneck. SCADA flips the paradigm by letting GPUs drive IO directly with storage, reducing latency and maximizing bandwidth utilization. Combined with PCIe Gen6, high-performance SSDs, this architecture enables real-time data access for workloads like vector databases, graph neural networks and large-scale inference pipelines.
See it live
Visit the Micron booth at SC25 in St. Louis (Nov. 18–20), booth # 3516, to experience this breakthrough. We’ll have:
- An animation illustrating the system architecture and performance metrics.
- An open-top H3 Falcon 6048 server system showcasing the hardware stack with Micron 9650 SSDs, NVIDIA H100 GPUs and DDR5 DRAM.
- Hardware sample of the Broadcom PEX90000 PCIe Gen6 series switches.
- Experts on hand to discuss how SCADA, PCIe Gen6, Broadcom PCIe switches and Micron SSDs are shaping the future of AI infrastructure.