Game-TARS: A Generalist AI Agent Masters Games and Beyond with Human-Native Interaction

Bytedance Seed has introduced Game-TARS, a groundbreaking generalist game agent trained with a unified action space anchored to native keyboard and mouse inputs. This approach allows for scalable, continual pre-training across diverse environments including operating systems, web interfaces, and simulation games, overcoming the limitations of API or GUI-based methods. Game-TARS is pre-trained on over 500 billion tokens from various trajectories and multimodal data sources. Key innovations include a decaying continual loss function to mitigate causal confusion, and an efficient Sparse-Thinking strategy to balance reasoning depth with inference cost. In experiments, Game-TARS demonstrated significant performance improvements, achieving approximately twice the success rate of previous state-of-the-art models on open-world Minecraft tasks. It also rivaled the adaptability of fresh human players in unseen web 3D games and surpassed GPT-5, Gemini-2.5-Pro, and Claude-4-Sonnet in FPS benchmarks. Scaling experiments confirm that this unified action space sustains these improvements when applied to cross-game and multimodal data. The results indicate that scalable action representations, combined with extensive pre-training, pave the way for generalist agents with broad problem-solving capabilities in diverse virtual environments.