May 22, 2025 – At Microsoft’s Build 2025 developer conference, NVIDIA CEO Jensen Huang and Microsoft CEO Satya Nadella unveiled a transformative partnership driving the next frontier of AI innovation. With the Grace Blackwell GB300 now in full production on Azure and groundbreaking advancements in CUDA software, the two companies are redefining computing with a 40x performance boost over the Hopper architecture in just two years. This collaboration is not just about hardware—it’s about delivering “tokens per dollar per watt,” accelerating AI, and transforming data centers into AI factories that power diverse workloads.

Grace Blackwell: Hyperdriving Moore’s Law
The NVIDIA Grace Blackwell GB300, now scaling across Azure, is a monumental leap in AI computing. Signed by Jensen Huang himself, this system delivers 40 petaflops of performance, surpassing the capabilities of previous supercomputers. Compared to the Hopper architecture, Grace Blackwell offers 1.5x more inference performance, 1.5x more HBM memory, and 2x more networking bandwidth, all within a liquid-cooled, FP4 Tensor Core architecture. This enables 40x faster processing for AI workloads, from large language models to agentic AI that “thinks” before responding.
The secret sauce? NVIDIA’s NVLink technology, which scales up compute nodes with a 7.2 terabytes per second switch, and a coherent connection between Grace CPUs and Blackwell GPUs optimized for KV caching—critical for advanced AI models. This hyperdrive of Moore’s Law ensures Azure’s AI supercomputers are the largest and most advanced in the world.
Why It Matters: Grace Blackwell empowers developers to tackle complex AI tasks, like real-time reasoning and large-scale model training, with unprecedented efficiency, making Azure the go-to platform for cutting-edge AI innovation.
CUDA: The Software Engine Behind AI Acceleration
NVIDIA’s CUDA platform is the backbone of this revolution, enabling 40x performance improvements on existing Hopper systems through software optimizations like in-flight batching, speculative decoding, and prompt caching. These advancements ensure that even older NVIDIA architectures, like Ampere and A100s, continue to benefit from new algorithms, making the entire Azure GPU fleet more efficient over time.
Huang emphasized CUDA’s stable architecture, which allows developers to build on a consistent platform from Pascal to Blackwell and beyond. This stability fosters a rich ecosystem where researchers and developers invest in optimizing models, knowing their work will benefit millions of users across Azure’s vast install base.
Why It Matters: CUDA’s compatibility ensures that software innovations, like those powering Llama 70B or Transformer models, enhance performance across all NVIDIA GPUs, maximizing ROI for Azure customers and developers.
AI Factories: The New Unit of Computing
Nadella and Huang described modern data centers as AI factories, where entire fleets of GPUs work as a single unit to process AI workloads. Unlike traditional PCs or servers, these factories require annual upgrades to capitalize on NVIDIA’s 40x generational leaps. By integrating new architectures like Grace Blackwell every year, Azure ensures its fleet remains at the cutting edge, delivering cost-averaged performance that benefits customers over the fleet’s four-to-five-year lifespan.
This “speed of light execution” between NVIDIA and Microsoft enables tight integration, allowing Azure to deploy the largest AI supercomputer in the world just two years after their last milestone. The complexity of building these factories is immense, but the payoff is a scalable, high-performance infrastructure that powers AI innovation globally.
Why It Matters: AI factories transform data centers into dynamic, scalable systems, enabling businesses, researchers, and developers to harness AI for everything from real-time analytics to enterprise applications.
Diverse Workloads, Maximum Utilization
CUDA’s versatility extends beyond AI, accelerating a long tail of workloads like data processing (20x-50x faster), video transcoding, image processing, recommender systems, and vector search engines. As frontier AI models move to Blackwell, older GPUs like Ampere remain highly utilized for these diverse tasks, ensuring Azure’s fleet delivers value across its lifecycle.
Microsoft’s Azure Container Service further enhances this flexibility, allowing customers to run any AI agent or workload on the GPU fleet, optimizing for latency, cost of goods sold (COGS), and performance. This approach ensures that Azure customers, from startups to enterprises, can leverage NVIDIA’s GPUs for a wide range of applications, not just cutting-edge AI.
Why It Matters: The ability to accelerate diverse workloads maximizes the value of Azure’s GPU fleet, making it a cost-effective solution for businesses of all sizes.
Tags: NVIDIA, Microsoft Azure, Grace Blackwell, CUDA, AI Supercomputer, Build 2025, AI Factory, Moore’s Law, Cloud Computing, Artificial Intelligence, NVLink, Developer Tools