Why AI Still Can’t Beat a New Video Game
Lessons from decades of game-playing machines suggest artificial general intelligence remains far away.
For decades, video games have served as a proving ground for artificial intelligence. From early checkers programs to systems that conquered chess and Go, each milestone has seemed to bring machines closer to human-like intelligence. But a new paper by Julian Togelius and colleagues argues that this narrative is misleading. Despite impressive victories, today’s AI still struggles with a deceptively simple challenge: playing a game it has never seen before.
Most headline-grabbing successes in game AI rely on systems that are finely tuned to a single game. These systems can achieve superhuman performance, but only within narrow boundaries. Change the rules, visuals or environment even slightly, and their competence can collapse.
This brittleness reveals a deeper limitation. Intelligence, as humans experience it, is not just about mastering one task but adapting to new ones. Video games, with their enormous variety of mechanics and goals, offer an unusually rich testbed for that kind of flexibility. As the authors note, games collectively probe a wide range of cognitive skills, from spatial reasoning and long-term planning to social intuition and learning through trial and error. Yet modern AI systems fall short on this broader challenge.
One major approach, reinforcement learning, has powered many recent breakthroughs. These systems learn by trial and error, improving through millions — or billions — of simulated plays. But they tend to overfit, becoming experts at the exact scenarios they were trained on while failing to generalize. Even minor changes, such as shifting colors or positions on a screen, can render a trained agent ineffective.
Planning-based systems, such as those used in chess or Go, offer more generality. They simulate possible moves and outcomes rather than relying on prior training. But they depend on fast, accurate simulations — something that most modern video games, and certainly the real world, cannot provide at scale.
Large language models, the technology behind today’s most visible AI tools, might seem like a promising alternative. After all, they can write essays, generate code and solve complex reasoning tasks. But when it comes to playing unfamiliar games, they perform surprisingly poorly.
Even in cases where language models appear to succeed — such as playing well-known games — the results often rely on elaborate, game-specific scaffolding. Systems are augmented with tools to interpret game states, manage memory and execute actions. Strip away this custom infrastructure, and performance drops sharply.
The gap likely exists do to the nature of training data. Language models are trained on vast amounts of text, not on sequences of game states and actions. As a result, they lack the embodied understanding and interactive experience that games demand.
The authors suggest that truly general game-playing ability would require something very different: an AI that can learn a new game from scratch in roughly the same time it takes a skilled human — perhaps tens of hours — without relying on prior exposure or massive simulation.
That benchmark is far beyond current capabilities. Today’s reinforcement learning systems require far more data, while language models lack the mechanisms to accumulate and refine knowledge over extended interaction. Bridging this gap would likely demand entirely new architectures and learning paradigms.
The implications extend well beyond gaming. The ability to adapt to unfamiliar situations is central to the idea of artificial general intelligence (AGI). If an AI cannot handle a novel video game — a controlled, simplified environment — it is unlikely to cope with the unpredictability of the real world.
The paper offers a different perspective on one area where AI does excel: computer programming. Coding, the authors argue, can be viewed as a kind of “game” with clear rules, well-defined goals and immediate feedback through debugging and testing. Modern AI systems have effectively mastered this particular game because they have been trained extensively on its structure and data.
But outside such well-structured domains, their abilities remain limited.
Ultimately, the researchers propose that games should remain central to AI evaluation. Not as isolated challenges but as a vast, evolving ecosystem of tests for adaptability and creativity. A truly intelligent system would not only learn to play new games efficiently but might even invent compelling ones of its own.