NVIDIA Grove Streamlines AI Inference on Kubernetes

NVIDIA announced the availability of NVIDIA Grove, a Kubernetes API designed to simplify modern Machine Learning (ML) inference workloads on Kubernetes clusters. Grove addresses the complexities of managing multi-component AI inference systems by enabling users to scale deployments across tens of thousands of GPUs.

Key Features of NVIDIA Grove

Grove offers several advanced features to optimize AI inference workloads:

Multilevel Autoscaling: Scales individual components, related component groups, and entire service replicas.
System-Level Lifecycle Management: Manages recovery and updates for complete service instances.
Flexible Hierarchical Gang Scheduling: Enforces minimum viable component combinations while allowing flexible scaling.
Topology-Aware Scheduling: Optimizes component placement based on network topology.
Role-Aware Orchestration: Ensures reliable initialization with role-specific configuration and dependencies.

Hierarchical Custom Resources

Grove uses three hierarchical custom resources in its Workload API to orchestrate multicomponent AI workloads:

PodCliques: Groups of Kubernetes pods with specific roles and independent configuration.
PodCliqueScalingGroups: Bundles of tightly coupled PodCliques that scale together.
PodCliqueSets: Defines the entire multicomponent workload, specifying startup ordering and scaling policies.

Getting Started with Grove

NVIDIA provides a step-by-step guide to deploy a disaggregated serving architecture using Dynamo, which includes a KV-routing deployer and the Qwen3 0.6B model. Grove is fully open source, and NVIDIA encourages community contributions, pull requests, and feedback.

Showcasing at KubeCon 2025

NVIDIA will showcase Grove at KubeCon 2025 in Atlanta, where attendees can learn more about its capabilities and potential use cases.