Stop saying LLMs are just “next token predictors.”
Nothing shows me how little someone knows about AI (and related topics) than this statement:
I get what people mean when they do a single comment on a post saying this. For many common LLMs, especially GPT-style autoregressive models, next-token prediction is core to both pretraining and generation. In the simplest case: train model to predict next token > generate one token at a time > wrap it in a larger system with prompts, decoding rules, tools, retrieval, memory, etc.
That’s true.
But saying LLMs are just next-token predictors is one of those statements that is technically grounded while being deeply misleading and damaging to lurkers who don’t know better.
It confuses the objective/interface with the learned system.
A trained model isn’t just its loss function. Saying “it predicts the next token” is like saying a chess engine “just picks the move with the best score,” or saying a musician “just plays the next note.” True, but unbelievably weak argument. It skips over the thing we actually care about: what structure has been learned, what representations have formed, what computations the trained network appears to implement, and what capabilities result.
To predict text well at scale, a model is incentivized to learn representations that encode grammar, syntax, style, semantic relationships, factual regularities, code patterns, social conventions, discourse structure, and reasoning-like heuristics. Some of this is shallow pattern matching; some is memorization; some is brittle; some is spurious correlation, but some of it appears to be useful abstraction.
Yes, not perfectly nor like humans nor with the same kind of embodiment, persistent memory, agency, etc., but also not in the shallow sense people are implying by “autocomplete.”
When folks say “just next-token predictor,” it’s often imply a much stronger claim:
“It predicts the next token, therefore it doesn’t understand anything.”
“It predicts the next token, therefore it can’t reason.”
“It predicts the next token, therefore all apparent intelligence is fake.”
Those conclusions don’t follow.
Prediction can require modeling. If I ask you to predict the next …
- move in a chess game, the best predictor may need to represent the board, legal moves, threats, plans, and strategic context.
- line in a proof, the best predictor may need to track the logic.
- line of code, the best predictor may need to infer the goal, constraints, API behavior, and likely implementation.
Prediction doesn’t guarantee deep understanding, but it also doesn’t prevent it.
Whether LLMs “understand” depends partly on what someone means by understanding. If they mean consciousness, lived experience, sentience, agency, embodiment, or human-like mental states, then I don’t think current LLMs have that, and I don’t think we have good evidence that they do. But consciousness isn’t exactly a solved problem either, so I’d be careful about pretending this is settled by saying “lololol it predicts tokens.” The argument can’t just be "the objective is prediction, therefore understanding is impossible.” But the argument also can’t be "the objective is prediction, therefore understanding is impossible.”
People keep skipping this distinction.
LLMs can feel like magic, but they aren’t magic. I don’t think we have good evidence that current LLMs are conscious, sentient, or having lived experience: they hallucinate, they’re brittle, they can produce reasoning-like outputs without reliably generalizing, and they often need tools, retrieval, verification, and human oversight. But that isn’t the dunk people think it is. Humans also need tools, notes, calculators, routines, peer review, PR reviews, editors, mentors, and institutional scaffolding. The point is not that humans are unscaffolded minds while LLMs are fake because they need support; the point is that LLMs have different … failure modes, grounding, memory, agency, and accountability structures.
But “just next-token prediction” by itself isn’t a serious analysis of those limitations. It’s a factually, defensible phrase meant to lol @ something while being stapled to a bad inference. The phrase is true enough to get upvotes, but the implication is wrong enough to make the conversation worse.
“Next-token predictor” describes the training objective and generation interface of many LLMs, but it doesn’t entirely describe what the trained model has learned, what it can do, or how larger AI systems built around such models behave when connected to tools, memory, retrieval, code execution, agent loops, and feedback mechanisms.
For the love of god, just stop saying it. They are just next-token predictors is reductionist in exactly the wrong way; it makes people seem and feel like they’ve explained the system when they’ve just named one part of it.
Key Takeaways
- Say “LLMs can predict text” instead of “they are just next-token predictors.”
- Understand that prediction is a tool, not the whole system. LLMs have learned various representations and heuristics for this task.
- Be cautious when making strong claims about what LLMs can or cannot do based on their training objectives.
Originally published at reddit.com. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




