AI Game Playing and Rule Following

Testing AI with Games

Building reliable and effective AI models requires evaluating their capacity to adapt to and adhere to new rules, beginning with testing their ability to play games on the fly.

According to Scientific American’s Martin Gardner, even a simple game like tic-tac-toe has complex variations. Gardner’s games may provide insight into making artificial intelligence more humanlike because games involve imagining, understanding, and following rules. AI models face challenges in navigating rules as they start to resemble human thought. The ultimate objective of machine-learning and AI research is artificial general intelligence, which can only be attained by creating AIs that can interpret, adapt to, and rigorously follow the rules that are established.

The Gardner Test

To foster the development of such AI, a new evaluation called the Gardner test is needed, in which an AI is presented with the rules of a game and then required to play by those rules without assistance. The rules can be disclosed at the start of the game.

The Gardner test builds on general game playing (GGP), a field shaped by Michael Genesereth at Stanford University. In GGP competitions, AIs compete in games with rules revealed at the start. The proposed test advances this by accepting game rules expressed in natural language. This is now within reach because of breakthroughs in large language models (LLMs) such as ChatGPT, Claude, and Llama.

The proposed evaluation should include tests focused on games such as Connect Four, Hex, and Pentago, and should also use games that Gardner wrote about. The design of the test could involve the GGP research community, AI model developers, and Martin Gardner fans.

Adapting to New Rules

Passing the new test requires creating an AI system that can master any strategy game on the fly. Strategy games require the ability to think ahead, handle unpredictable responses, adapt to changing objectives, and conform to a strict rule set. Current AI models rely on knowing the rules in advance to train their algorithms. AlphaZero, for instance, learns through self-play but requires the rules to be set before training. While AlphaZero can master games, it will be unable to play a game it has not learned.

An AI model performing well on the proposed new test would be capable of adapting to new rules, even without data. It would play any game and follow any novel rule set with precision. A general intelligence that would pass the Gardner test would be able to follow the rules perfectly, while specialized tools tend to make errors from training data rather than following the rules.

It’s easy to imagine real-world scenarios in which such errors could be catastrophic: in a national security context, for instance, AI capabilities are needed that can accurately apply rules of engagement dynamically or negotiate subtle but crucial differences in legal and command authorities. In finance, programmable money is emerging as a new form of currency that can obey rules of ownership and transferability—and misapplying these rules could lead to financial disaster.

Building AI systems that can follow rules rigorously would make it possible to create machine intelligences that are far more humanlike in their flexibility and ability to adapt to uncertain and novel situations. Game playing with a novel set of rules is crucial to the next evolution of AI because it will potentially let us create AIs that will be capable of anything—but that will also meticulously and reliably follow the rules we set for them.

If we want powerful but safe AI, testing its ability in playing games on the fly might be the best path forward.