CATArena: New AI Tournament Framework Evaluates Learning and Strategy Coding in LLM Agents

A new framework called CATArena has been developed to evaluate the learning abilities of Large Language Model (LLM) agents through iterative tournament competitions. Proposed by researchers at Shanghai Jiao Tong University, AGI-Eval, and Meituan, CATArena addresses the limitations of current benchmarks that primarily assess end-to-end performance in fixed scenarios. It emphasizes learning ability—both self-improvement and peer-learning—as crucial for agent evolution toward human-level intelligence. The CATArena platform features four diverse board and card games with open-ended scoring, enabling continuous and dynamic evaluation of agent capabilities. The framework allows agents to refine their strategies through repeated interactions and feedback. Experiments involving both minimal and commercial code agents demonstrate that CATArena provides reliable benchmarking for core agent abilities, particularly learning ability and strategy coding. The framework's contributions include an iterative peer-learning-based competitive environment and a tournament-style benchmark with extensible evaluation across diverse tasks. By focusing on strategy coding, CATArena represents a novel evaluation dimension not addressed in previous work, distinguishing it from traditional LLM reasoning tasks.