Turing Award winner Richard Sutton says pure generative AI can’t do real science
Turing Award winner Richard Sutton argues that ordinary generative AI lacks a key ability for scientific discovery: it can’t evaluate and develop its own results.
Large language models, image generators, and video models learn from massive amounts of examples and produce outputs that resemble them. According to Sutton, when these outputs are good, it’s usually thanks to the source material: the texts, images, or data the model learned from. When the outputs are truly novel, they go beyond that material. For factual queries, that’s called hallucination.
Sutton illustrates his critique with an old researcher’s joke: “This work is both novel and good. Unfortunately, the parts that are good are not novel, and the parts that are novel are not good.” That diagnosis fits large parts of today’s generative AI, Sutton says. It can mimic useful things or randomly produce new things, but it can’t tell on its own which new ideas are actually good.
Sutton doesn’t deny that generative AI can be useful for summaries, research, assistants, or entertainment. Novelty often isn’t even the goal: a summary shouldn’t invent new facts, and research shouldn’t sneak in extra claims. “Generative AI can be extremely useful, even when it just mimics, if it is faster, or cheaper, or smaller, or more customizable, or more copy-able, than the thing being mimicked,” Sutton says.
Imitation falls short for science
In Sutton’s view, this boundary matters most for science in general, where the point isn’t to reproduce what’s already known but to discover new things, test them, and turn them into lasting knowledge.
Sutton describes genuine discovery as a three-step process: variation, evaluation, and selective retention. A system has to generate different options, test them, and keep using the approaches that work. Sutton says this principle exists in evolution, in the scientific method, in planning, in search, and in reinforcement learning.
What pure generative AI lacks most is evaluation. Language and image models do generate different variants. But without testing, there’s no selection of the best and no discovery. “The novelty flickers into existence, but if its value is unrecognized, it flickers away and is lost,” Sutton says.
Evaluation can come from humans, for example, when users pick the best image from several AI-generated options. But it can also come from a clear goal: a checkmate, a formally valid proof, a successful program run, or a high reward in a simulated environment. Only that kind of feedback turns mere generation into a search and discovery process.
AlphaGo, AlphaFold, and Claude Code show the difference
Sutton says some AI systems that go beyond pure generative AI are already “capable of true creativity and true discovery.” He points to examples like AlphaGo with its famous move 37, AlphaZero with its unique chess style, AlphaFold in protein structure prediction, AlphaProof in math, Claude Code in programming, and GT-Sophy in simulated racing.
What these systems share is an evaluation loop that goes beyond pure text or image generation. A Go move either raises the chance of winning or it doesn’t. A math step can be formally checked, or it can’t. Code passes tests, runs correctly, or fails. This makes it possible to select and pursue better solutions.
“All these systems have some additional features that make them capable of true creativity and true discovery,” Sutton says.
Sutton’s critique explicitly targets “ordinary” generative AI: models that don’t evaluate their own output at runtime. Language models extended with search, verifiers, tools, reinforcement learning, or formal validators can become part of genuine discovery systems. But how far that structure can stretch beyond programming, games, and clearly testable tasks remains an open question.
Sutton sees another issue in how neural networks are trained. Standard networks start with random settings and then learn from data. That initial randomness is a source of variation, but it mostly happens at the beginning. Over time, models can lose their ability to learn as their internal structures get rigid.
A truly learning system shouldn’t just be trained once, Sutton argues. It would need to renew its structure on an ongoing basis: try new possibilities, keep what works, and discard what doesn’t. His goal is an AI that manages variation, evaluation, and selective retention on its own over long stretches of time. “Let’s fully automate Creativity and Discovery!” he says.
Sutton has been critical of the AI industry’s direction for a while
Sutton recently criticized the AI industry more broadly, saying it has “lost its way.” The researcher is mainly pushing back against the heavy focus on ever-larger language models that absorb vast knowledge during training but don’t learn from their own experience over time.
Instead, Sutton calls for AI agents that interact with their environment continuously, learn from it, build internal models of the world, and plan new strategies. Meta-learning also factors into his vision: systems should learn how to learn better instead of just mimicking individual tasks.
In his Oak architecture, Sutton lays out a possible path to powerful AI systems. The core idea is that agents start with no built-in specialist knowledge, act in an environment, get feedback, and form increasingly abstract concepts over time. Useful concepts become the foundation for the next stage of learning.
The big open prerequisite for this, Sutton says, is reliable continual learning. Today’s neural networks often struggle to absorb new knowledge without overwriting old knowledge or losing the ability to adapt.
Subscribe now
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




