AWS AI Infrastructure with NVIDIA Blackwell

Source: aws.amazon.com

Published on July 10, 2025

AWS AI Infrastructure with NVIDIA Blackwell

Imagine a system that explores approaches to complex problems, using its understanding of data, from datasets to source code to business documents, and reasoning through possibilities. This reasoning is happening today in customer AI production environments. The scale of AI systems are being built across drug discovery, enterprise search, and software development.

To accelerate innovation across generative AI developments like reasoning models and agentic AI systems, general availability of P6e-GB200 UltraServers, accelerated by NVIDIA Grace Blackwell Superchips, is announced. P6e-GB200 UltraServers are designed for training and deploying AI models. Earlier this year, P6-B200 instances, accelerated by NVIDIA Blackwell GPUs, were launched for AI and high-performance computing workloads. These compute solutions build on delivering secure, reliable GPU infrastructure at scale, so customers can push the boundaries of AI.

P6e-GB200 UltraServers

P6e-GB200 UltraServers are our GPU offering to date, featuring up to 72 NVIDIA Blackwell GPUs interconnected using fifth-generation NVIDIA NVLink—functioning as a single compute unit. Each UltraServer delivers 360 petaflops of dense FP8 compute and 13.4 TB of total high bandwidth GPU memory (HBM3e)—over 20 times the compute and over 11 times the memory in a single NVLink domain compared to P5en instances. P6e-GB200 UltraServers support up to 28.8 Tbps aggregate bandwidth of fourth-generation Elastic Fabric Adapter (EFAv4) networking.

P6-B200 Instances

P6-B200 instances are an option for AI use cases. Each instance provides 8 NVIDIA Blackwell GPUs interconnected using NVLink with 1.4 TB of high bandwidth GPU memory, up to 3.2 Tbps of EFAv4 networking, and fifth-generation Intel Xeon Scalable processors. P6-B200 instances offer up to 2.25 times the GPU TFLOPs, 1.27 times the GPU memory size, and 1.6 times the GPU memory bandwidth compared to P5en instances.

The choice between P6e-GB200 and P6-B200 comes down to workload requirements and architectural needs. Bringing NVIDIA Blackwell isn’t about a breakthrough—it’s about innovation across infrastructure. By building on learning and innovation across compute, networking, operations, and managed services, NVIDIA Blackwell’s capabilities are available with the reliability and performance customers expect.

Customers value our focus on instance security and stability. The AWS Nitro System’s hardware, software, and firmware are designed to enforce restrictions so nobody, including anyone in AWS, can access sensitive AI workloads and data. The Nitro System handles networking, storage, and other I/O functions, making it possible to deploy firmware updates, bug fixes, and optimizations while it remains operational. This ability to update without system downtime, called live update, is crucial where any interruption impacts production timelines. P6e-GB200 and P6-B200 both feature the sixth generation of the Nitro System. Our Nitro architecture has been protecting and optimizing Amazon Elastic Compute Cloud (Amazon EC2) workloads since 2017.

In AI infrastructure, the challenge is delivering performance and reliability at scale. P6e-GB200 UltraServers are deployed in third-generation EC2 UltraClusters, which creates a single fabric that can encompass our largest data centers. Third-generation UltraClusters cut power consumption by up to 40% and reduce cabling requirements by more than 80%. To deliver performance at scale, Elastic Fabric Adapter (EFA) with its Scalable Reliable Datagram protocol routes traffic across multiple network paths to maintain operation during congestion or failures. EFA’s performance has been improved across four generations. P6e-GB200 and P6-B200 instances with EFAv4 show up to 18% faster collective communications in distributed training compared to P5en instances that use EFAv3.

P6-B200 instances use air-cooling infrastructure, P6e-GB200 UltraServers use liquid cooling, which enables compute density in NVLink domain architectures, delivering system performance. P6e-GB200 are liquid cooled with mechanical cooling solutions providing liquid-to-chip cooling in data centers, so we can support liquid-cooled accelerators and air-cooled network and storage infrastructure in the same facility. With this cooling design, we can deliver performance and efficiency at the lowest cost.

It’s simple to get started with P6e-GB200 UltraServers and P6-B200 instances through deployment paths, so you can begin using Blackwell GPUs while maintaining the operational model that works best.

If you’re accelerating AI development and want to spend less time managing infrastructure and cluster operations, Amazon SageMaker HyperPod provides managed, resilient infrastructure that automatically handles provisioning and management of GPU clusters. SageMaker HyperPod will support both P6e-GB200 UltraServers and P6-B200 instances, with optimizations to maximize performance by keeping workloads within the same NVLink domain. A multi-layered recovery system is built in: SageMaker HyperPod will automatically replace faulty instances with preconfigured spares in the same NVLink domain. Built-in dashboards will give you visibility into GPU utilization and memory usage to workload metrics and UltraServer health status.

For large-scale AI workloads, if you prefer to manage your infrastructure using Kubernetes, Amazon Elastic Kubernetes Service (Amazon EKS) is a control plane. Amazon EKS will support both P6e-GB200 UltraServers and P6-B200 instances with automated provisioning and lifecycle management through managed node groups. For P6e-GB200 UltraServers, topology awareness is built in that understands the GB200 NVL72 architecture, automatically labeling nodes with their UltraServer ID and network topology information to enable workload placement. You will be able to span node groups across multiple UltraServers or dedicate them to individual UltraServers, giving you flexibility in organizing your training infrastructure. Amazon EKS monitors GPU and accelerator errors and relays them to the Kubernetes control plane for remediation.

P6e-GB200 UltraServers will also be available through NVIDIA DGX Cloud. DGX Cloud is an AI platform optimized at every layer with multi-node AI training and inference capabilities and NVIDIA’s AI software stack. It offers flexible term lengths along with NVIDIA expert support and services to help you accelerate your AI initiatives.

This launch announcement is a milestone. As AI capabilities evolve, you need infrastructure built for today’s demands but for possibilities ahead. With innovations across compute, networking, operations, and managed services, P6e-GB200 UltraServers and P6-B200 instances are ready to enable these possibilities.

David Brown is the Vice President of AWS Compute and Machine Learning (ML) Services. He is responsible for building all AWS Compute and ML services, including Amazon EC2, Amazon Container Services, AWS Lambda, Amazon Bedrock and Amazon SageMaker. He also leads solutions, such as AWS Outposts, that bring AWS services into customers’ private data centers.