Can tech companies learn to love cheaper AI models?

Disclosure: Some links in this article are affiliate links. AI Maestro may earn a commission if you make a purchase, at no…

By AI Maestro June 9, 2026 3 min read
Can tech companies learn to love cheaper AI models?

For creators and builders, the era of blindly chasing the biggest brain is ending. The old rule that scale equals superiority is cracking, forcing makers to ask whether their projects truly need the most expensive compute or if a leaner, cheaper alternative will do just as well.

As prices climb, the industry is finally looking at smaller options with a critical eye. This shift towards cost-conscious model selection is uncharted territory, yet its potential to reshape the landscape is profound.

The shift to the 99% cheaper option

Brian Armstrong, co-founder of Coinbase, has outlined a clear trajectory: the vast majority of tasks will migrate to significantly cheaper models.

“[D]emand for intelligence is near infinite, but 80% of workloads will be running on 99% cheaper models within 12-18 months,” Armstrong wrote on X. “20% of workloads will still run on latest gen models where IQ maxing is important.”

If this prediction holds, the economic foundation of the AI sector will undergo a seismic transformation.

Historically, competition has revolved around quality, compelling companies to default to the most advanced hardware available. If standard jobs can be handled by budget models without a drop in performance, the financial equation changes drastically. Crucially, the bulk of these savings will be retained by the major labs, delivering a financial blow to OpenAI and Anthropic just as they prepare for their initial public offerings.

Proving the smaller model works

The central question remains: are companies willing to switch?

Early experiments suggest that with the right architecture, cheaper models can replace their larger counterparts without sacrificing output. In a recent trial by the legal AI platform Harvey, the firm managed to cut inference costs by three times while maintaining quality. Working with inference provider Fireworks AI, the team paired Claude Opus with Fireworks’ GLM 5.1, reserving the heavy-lifting for the most intensive tasks. The outcome was a marked reduction in server time and total expenditure.

“Quality comes first, and in legal it always will,” Harvey co-founder Gabe Pereyra told TechCrunch, referring to the AI legal services his startup provides. “However, the definition of quality is evolving from simply using the most powerful model for everything, to using the best model that gets the right answer most efficiently.”

This movement is often framed as a battle between Western giants and Chinese or open-weight alternatives, but that framing misses the core issue. The true divide is not proprietary versus open; it is large versus small. Financial efficiency can be achieved by switching from GPT-5.5 to DeepSeek’s V4 Flash, but moving to GPT-5.4-mini yields identical results.

While a price war rages between in-house inference from major labs and independently served open-weight models, the specific identity of the smaller model is secondary. What matters is the size.

Challenging the scaling-first dogma

This logic may seem obvious—why use more compute than necessary?—but it directly contradicts the scaling-first philosophy that has dominated the sector. Inspired by the bitter lesson, labs have aggressively trained the most compute-intensive models possible, pushing the boundaries of AI capability. With heavy investor subsidies keeping prices artificially low, clients had no incentive to choose anything other than the cutting-edge option.

Now, as token prices rise and subsidies dwindle, users face real cost pressures for the first time. It remains uncertain whether this will drive enterprise users toward smaller models. They might instead economise by reducing call frequency, shortening context windows, or abandoning less promising deployments entirely.

However, if most applications perform equally well on smaller hardware, it could severely dampen the surging demand for inference. This development also raises difficult questions regarding how to justify the exorbitant costs of training frontier models if their unique capabilities are no longer strictly required.

Key takeaways

  • Armstrong predicts that 80% of AI workloads will migrate to models that are 99% cheaper within the next 12 to 18 months.

  • Practical tests, such as Harvey’s legal AI trial, demonstrate that switching to smaller models can reduce inference costs by 3x without compromising quality.

  • The industry’s economic focus is shifting from competing on raw model size to optimising for the most efficient tool that solves the problem.

  • As subsidies fade, the financial viability of training massive frontier models is under threat if they cannot justify their cost over smaller, capable alternatives.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top