Game-TARS: A Generalist AI Agent Masters Games and Beyond with Human-Native Interaction

Game-TARS: A Generalist AI Agent for Games and Beyond
Bytedance Seed has introduced Game-TARS, a groundbreaking generalist AI agent designed to excel in games and virtual environments. This innovative agent leverages a unified action space anchored to native keyboard and mouse inputs, enabling scalable and continuous pre-training across diverse environments such as operating systems, web interfaces, and simulation games.
Unlike traditional API or GUI-based methods, Game-TARS is pre-trained on over 500 billion tokens derived from various trajectories and multimodal data sources. This extensive training allows the agent to adapt seamlessly to different scenarios, demonstrating significant performance improvements in complex tasks.
Key Innovations
Game-TARS incorporates several key innovations to optimize its performance. These include a decaying continual loss function, which mitigates causal confusion, and an efficient Sparse-Thinking strategy that balances reasoning depth with inference cost. These advancements enable the agent to handle intricate tasks with greater efficiency and accuracy.
Performance Highlights
In experimental settings, Game-TARS achieved approximately twice the success rate of previous state-of-the-art models in open-world Minecraft tasks. It also matched the adaptability of fresh human players in unseen web 3D games and outperformed models like GPT-5, Gemini-2.5-Pro, and Claude-4-Sonnet in FPS benchmarks. These results underscore the agent's versatility and effectiveness in diverse virtual environments.
Implications for AI Research
The success of Game-TARS highlights the potential of scalable action representations combined with extensive pre-training. This approach paves the way for generalist AI agents with broad problem-solving capabilities, opening new possibilities for AI applications in gaming, simulation, and beyond.