Why Specialization Is Inevitable

Goldfeder, Wyder, LeCun, and Shwartz-Ziv argue that specialization is the only logical path for effective AI

Four separate fields — optimisation theory, evolutionary biology, competitive markets, and machine learning — arrive at the same conclusion. A system built to fit a specific target will outperform one that tries to cover everything.

This analysis draws on the 2026 paper AI Must Embrace Specialization via Superhuman Adaptable Intelligence. The authors provide a structural case for why narrow focus beats broad capability. The framing and synthesis here are Dharma AI’s.

—

The math says breadth does not win

Wolpert and Macready proved in 1997 that no single general-purpose optimisation algorithm beats all others across every possible problem. The proof is mathematical. Averaged across every conceivable scenario, every algorithm performs equally well and equally poorly. Gains on one set of problems are simply redistributed across others. The total performance does not increase.

The practical result is clear: an algorithm wins by being a good fit for the target problem. Generality is not a performance advantage. The structural path to outperformance is concentration. You trade breadth for fit.

Finite resources make this sharper. Real systems operate under limits. They have finite compute, finite data, and finite development time. Given a fixed amount of energy, directing resources toward a finite set of tasks beats spreading them across an unlimited range. As the task set expands without bound, the resources available per task shrink toward zero. Universal coverage and meaningful performance are in direct tension when resources are limited.

The conclusion is operational, not philosophical. As the paper states, “universal generality is a theoretical concept, but in practical terms it is a myth”. The system that survives contact with real constraints is the one that fits its target, not the one that tries to do everything.

—

Biology and markets knew this first

Two other domains arrived at the same prediction before optimisation theory gave it a name.

In biology, every performance gain in one niche comes at a cost elsewhere. A generalist carries traits suited to many environments but optimal for none. Competence spreads too thin to dominate any particular condition. There are no performance gains without trade-offs. The resources invested in one capability are unavailable for another. Selection favours designs matched to local conditions over those optimised for uniform coverage across all possible environments. Organisms that survive to reproduce are not the most generally capable; they are the most specifically matched. The result, accumulated over evolutionary timescales, is specialists filling niches. As the paper states: “Specialization is not an accident of biology; it is a predictable consequence of limited resources, competing objectives, and environments that reward performance on a small subset of evolutionarily relevant challenges”.

Competitive markets follow the same dynamic through different means. Organisations and strategies that fail to meet performance thresholds are eliminated — not through extinction, but through exit, defunding, and replacement by better-matched alternatives. Competition acts as a selection mechanism. It amplifies effective strategies and eliminates ineffective ones. The mechanism has nothing in common with biological selection. There is no inheritance, no mutation, no evolutionary timescale. The unit of selection is the organisation, the product, or the strategy. Yet the structural pressure is the same. Finite resources, performance requirements, and the systematic removal of entities too broadly distributed to excel where it counts. Concentrated capacity outcompetes distributed capacity when performance standards are clear and consistent.

Evolution and markets operate through entirely different mechanisms. They use different timescales, different units of selection, and different inheritance methods. Yet both produce the same outcome under resource pressure: fit over breadth. The theorem predicts this. Biology and markets arrive at it independently. When a third domain arrives at the same finding through different means entirely, the pattern ceases to look like a theorem and begins to look like something more general about how constrained systems behave.

—

Machine learning keeps rediscovering the same lesson

The same pattern has emerged inside machine learning — not derived from optimisation theory, but arrived at through the accumulated experience of building systems and watching what improves them.

The clearest form is negative transfer. This is a measurable degradation that occurs when a system trained on multiple tasks suffers because those tasks compete rather than cooperate. When tasks share structure, training together helps. But when tasks compete for representational capacity, or impose conflicting gradients during training, performance on individual tasks falls below what a dedicated system would achieve. The gain from breadth becomes a cost to depth. It is a documented consequence of dividing finite capacity across tasks that pull against each other. The specialist, facing no such competition, does not pay this cost.

The architecture of frontier models offers a different form of evidence. Mixture-of-experts systems achieve their breadth not through uniform generality across all parameters, but by routing each input to a specialised subset of the network. Different experts activate for different tasks. The paper’s authors read this as a structural concession. A system designed to be general achieves its results by recovering specialisation internally. This is an argued interpretation, not a demonstrated theorem. These architectures were designed for computational efficiency. What they imply about generality’s limits is a reasonable inference rather than a stated intent. But it is a notable one. The most capable general-purpose systems reach their performance by doing internally what specialist systems do by design.

The clearest historical example follows the same logic. AlphaFold achieved a step change in protein structure prediction by targeting that specific task with task-specific architecture and training choices. Its gains came from narrower focus, not broader coverage. The paper uses AlphaFold as an archetypal case. It is not evidence that all specialised systems achieve equivalent gains, but an unusually clear illustration of the mechanism. That mechanism has appeared repeatedly. The history of AI milestones frequently reflects intense domain targeting rather than broad competence, even when the results look like demonstrations of general intelligence.

Three distinct places. Three different mechanisms. The same finding.

—

Scaling does not change the constraint

The picture would be incomplete without addressing one of AI research’s most cited observations. Sutton’s Bitter Lesson holds that methods relying on domain knowledge are consistently outperformed by methods that scale computation. On its face, this appears to complicate the case for specialisation. If scale and generality win, perhaps specialisation is only a useful heuristic under resource constraints that will ease as compute becomes cheaper.

The objection rests on a conflation between two distinct concepts. Domain knowledge refers to hand-coded features, engineered priors, and rules designed to give a system insight into a particular area. The Bitter Lesson targets this. Systems that encode explicit domain knowledge have been consistently outperformed as scale increases.

Domain specialisation is different. It is the decision to direct a system’s resources, architecture, and training toward a bounded set of tasks rather than distributing them broadly. This is not the encoding of knowledge about a domain. It is a decision about scope.

The paper draws the distinction precisely:

“The diminishing usefulness of domain knowledge is distinct from the usefulness of domain specialisation. As scaling progresses, we will need to know less about proteins to build a system that does protein folding; however, such a system still benefits from focusing specifically on proteins.”

Scaling changes what systems can learn from data. It does not change whether concentrating resources on a finite task set outperforms distributing them across an unlimited range. The Bitter Lesson and the specialisation argument operate on different dimensions. One describes how knowledge should be acquired. The other describes what a system should be pointed at. Both can be true simultaneously. Scaling changes the mechanisms by which systems learn. It does not dissolve the constraint that makes fit more valuable than breadth.

—

Across four analytical traditions, the same pattern emerged through different paths. This is not a coincidence that demands explanation. It is the evidence.

When finite resources meet selection pressure — in an optimisation problem, an ecosystem, a market, or a training run — fit consistently beats breadth. The specific mechanisms differ. The timescales differ. The units of selection differ. But the structural dynamic is the same, and it produces the same result.

The theorem does not cause this pattern in biology. Biology does not cause it in markets. Neither causes it in machine learning. They all face the same underlying constraint: performance under scarcity requires concentration.

Source Read original →