NVIDIA's ComputeEval 2025.2: A More Challenging Benchmark for AI-Generated CUDA Code

Published on November 7, 2025 at 12:00 AM
NVIDIA's ComputeEval 2025.2: A More Challenging Benchmark for AI-Generated CUDA Code
NVIDIA has announced ComputeEval 2025.2, a major update to its open-source benchmark designed to evaluate the proficiency of AI models and agents in CUDA programming. Released on November 7, 2025, this version adds more than 100 new CUDA challenges, increasing the total to 232 CUDA and CUDA Compute Core Libraries (CCCL) problems. The updated benchmark aims to assess and improve the capabilities of AI coding assistants in writing efficient CUDA code. The new challenges are designed to be more difficult, requiring LLMs to leverage modern CUDA features, including:
  • Tensor Cores
  • Advanced shared memory patterns
  • Warp-level primitives
  • CUDA Graphs, Streams, and Events
Evaluations of leading large language models (LLMs) on ComputeEval 2025.2 have shown a decline in scores compared to the previous version, ComputeEval 2025.1. This indicates that the new challenges effectively raise the bar for AI, demanding a more profound grasp of accelerated computing nuances. NVIDIA plans to further expand ComputeEval's coverage to include additional CUDA-X libraries such as cuBLAS, CUTLASS, cuDNN, and RAPIDS. The company encourages collaboration and contributions from the HPC and AI communities, with the code available on GitHub and the dataset accessible on Hugging Face.