NVIDIA Blackwell ultra Shatters AI Inference Records,ushering in a New Era of Performance
NVIDIA has once again redefined the boundaries of AI performance,with its groundbreaking Blackwell Ultra architecture delivering unprecedented results on the latest MLPerf Inference v5.1 benchmark suite. This isn’t just about faster numbers; it’s about fundamentally changing what’s possible with large language models (LLMs) and AI-powered applications – driving down costs, boosting productivity, and accelerating innovation.
A Leap Forward in Inference Speed & Efficiency
The NVIDIA GB300 NVL72 rack-scale system, powered by Blackwell Ultra, has set new records across a wide range of inference tasks. Specifically, it achieved up to 45% higher throughput on the DeepSeek-R1 model compared to systems utilizing the previous-generation Blackwell GB200 NVL72.This translates directly into faster response times for users and the ability to handle significantly larger workloads.
But the improvements don’t stop there. Blackwell Ultra builds upon the already notable Blackwell architecture, boasting 1.5x more NVFP4 AI compute and 2x faster attention-layer acceleration.Combined with up to 288GB of high-bandwidth HBM3e memory per GPU, this creates a powerhouse for demanding AI inference tasks.
Dominating the MLPerf Landscape
NVIDIA didn’t just excel in one area.The platform achieved record-breaking performance on all new data center benchmarks within MLPerf Inference v5.1, including:
* DeepSeek-R1
* Llama 3.1 405B Interactive
* Llama 3.1 8B
* Whisper
Furthermore, NVIDIA continues to hold the per-GPU performance lead on every MLPerf data center benchmark – a testament to the thorough optimization across hardware and software.
The Power of Full-Stack Co-Design
These results aren’t simply a matter of powerful hardware. NVIDIA’s success stems from a holistic, full-stack co-design approach.A key element is the NVFP4 data format – a 4-bit floating point format designed by NVIDIA that delivers superior accuracy compared to other FP4 formats, rivaling even higher-precision alternatives.
NVIDIA’s TensorRT Model Optimizer software plays a crucial role, intelligently quantizing models like deepseek-R1, Llama 3.1, and Llama 2 to NVFP4. Paired with the open-source TensorRT-LLM library, this optimization unlocks significant performance gains while maintaining the necessary accuracy for real-world applications.
Optimizing for Real-World Workloads
LLM inference involves distinct phases: processing user input (context) and generating the output (generation). NVIDIA’s innovative “disaggregated serving” technique separates these tasks, allowing each to be optimized independently. This approach was instrumental in achieving a nearly 50% performance increase per GPU on the Llama 3.1 405B Interactive benchmark, compared to conventional serving methods.
NVIDIA also debuted submissions utilizing its new Dynamo inference framework, further demonstrating its commitment to pushing the boundaries of AI performance.
A Collaborative Ecosystem Driving Innovation
NVIDIA’s partners are also contributing to this success. Leading cloud service providers and server manufacturers – including Azure, Broadcom, Cisco, CoreWeave, Dell Technologies, HPE, Oracle, and Supermicro – have submitted impressive results using NVIDIA Blackwell and Hopper platforms. This collaborative ecosystem ensures that the benefits of NVIDIA’s advancements are widely available.
Lower TCO, Higher ROI
The market-leading inference performance of the NVIDIA AI platform translates directly into tangible benefits for organizations. Expect lower total cost of ownership (TCO) and a significantly improved return on investment when deploying complex AI applications. Faster inference means more users served, more tasks completed, and ultimately, greater value delivered.
Dive Deeper
* Explore the detailed results and analysis in the NVIDIA Technical Blog on MLPerf Inference v5.1.
* Utilize the NVIDIA DGX Cloud Performance Explorer to analyze performance, model TCO, and generate custom reports.
This isn’t just an incremental advancement; it’s a paradigm