The token bill comes due: Inside the industry scramble to manage AI’s runaway costs

For creators and developers, the era of unrestricted AI experimentation has hit a wall. The era of “tokenmaxxing” is giving way to a frantic scramble for financial control. As per-token rates drop, the sheer volume of consumption driven by autonomous agents and aggressive adoption strategies has caused bills to spiral out of control. Companies that devoured all-you-can-eat subscriptions in early 2025 are now facing existential budget crises, forcing a re-evaluation of whether the return on investment can be salvaged from the wreckage.

The shift from capability to cost

The conversation in enterprise AI has changed fundamentally. Alexander Embricos, head of enterprise at OpenAI, told TechCrunch this week that discussions are no longer about whether a model is “good enough.” Instead, the dialogue centres on visibility, auditability, and efficiency.

“Six months ago, I would have a conversation with a customer and it would be all about ‘What can it do? Is it good enough?’ Our conversations are never about that now. Now the conversations are about, ‘hey, we’re spending so much. What visibility do you have? What auditability do you have? What token controls do you have? What is the efficiency of your models?'”

This urgency is driven by a market of startups and established vendors racing to provide the necessary tools. Simultaneously, a new standards body is being forged to impose the same discipline on AI tokens that FinOps brought to cloud spending.

The scale of the overspend

The crisis is immediate. Uber exhausted its entire 2026 AI coding budget by April. Microsoft revoked developer licenses for Claude Code months after enabling them. A Priceline employee noted that a routine Cursor contract renewal came back four to five times more expensive than expected.

J.R. Storment, executive director of the FinOps Foundation, described the situation as an “existential crisis.” He noted that the industry has shifted from a “go fast” mentality to a desperate need for guardrails.

The catalyst for this spending frenzy was the release of powerful agentic models in November, including Anthropic’s Claude Opus 4.5, OpenAI’s GPT-5.1, and Google’s Gemini 3 Pro. One company reportedly received a $500 million bill after failing to set usage limits for its staff.

Chris Reed, senior director of IT finance at Priceline, compared the situation to a drug epidemic. “They let you try it to get you hooked on it, and now you’re kind of beholden to it,” he said. The company has since begun placing strict token limits on specific groups.

Vitaly Gordon, CEO of Faros AI, shared a harrowing anecdote about a CTO whose engineer spent $40,000 on tokens in a single month, leaving management unsure whether to stop the behaviour or encourage it.

Productivity myths and data chaos

Data suggests the productivity gains are not as clean as the spending suggests. A March survey by Faros, involving 20,000 developers, found that while output rose, so did bugs and rewrites. Jellyfish found that engineers using the most tokens were about twice as productive as those using AI less, but they consumed ten times the number of tokens to achieve that result.

Nicholas Arcolano, head of research at Jellyfish, highlighted that expenditure is exploding due to agentic features, with per-developer consumption rising by 18.6 times in nine months.

“Whether extreme spend pays off comes down to the ultimate business value of shipped code (e.g. revenue), which most companies still can’t measure,” Arcolano said.

Storment emphasised the sheer scale of the tracking problem. While cloud cost management deals with hundreds of millions of rows per month, token tracking is a problem involving trillions of rows. “You’ve got to fundamentally rethink your tooling, your specs and your accounting systems to do that,” he stated.

A fragmented market seeking order

A market is already forming to address these discrepancies. Pure-play firms like Pay-i are emerging to track, measure, and optimise GenAI investments. Meanwhile, platforms like Paid allow developers to bill users based on actual value rather than flat subscriptions.

Established players are also moving in. Ramp has entered the AI spend management space, while Datadog and New Relic have added token-level observability and GPU monitoring. At the upcoming FinOps X conference, AWS is expected to introduce new financial management features specifically for enterprise AI.

Tiffany Luck, a partner at NEA, predicts that efficiency tools will likely appear at the “harness or app layer.” She pointed to Factory, a startup launching a model router that automatically selects the right model for every task. Gordon expects frontier labs to adopt similar optimisation techniques, routing queries to cheaper models like Sonnet or Haiku even when Opus is called.

However, a critical gap remains: there is no common language or shared definitions for what a token costs or how to compare spend across vendors. This is the primary goal of the new Tokenomics Foundation.

The Foundation aims to establish canonical definitions and open standards for AI token usage and billing. It plans to introduce new metrics such as cost-per-intelligence and tokens-per-watt. A formal launch is scheduled for July, with further member announcements expected next week.

“Token economics is fundamentally more abstract and opaque than anything we’ve managed at this scale before,” said Nishant Gupta, chief availability officer at Salesforce. “It requires a different operational muscle than the one the industry built for cloud.”

Despite these efforts, Goldman Sachs projects global token usage to multiply by 24 times by 2030. The companies currently over budget need solutions now, even though the foundation’s first deliverable is months away.

Gordon summed up the situation with a wry observation: “Maybe we created a steam engine, but we still haven’t figured out the assembly line.”

In the meantime, Arcolano advises a strategy of broad, moderate adoption. The best return on investment comes from moving the middle ground from low to moderate usage, rather than pushing heavy users to consume even more.

Key takeaways

Companies are facing immediate budget overruns due to autonomous agents, with some exhausting annual AI budgets by April.
The industry is shifting focus from model capability to strict cost auditability, requiring new tooling capable of processing trillions of data rows.
Emerging standards like those from the Tokenomics Foundation aim to create a common language for token economics and billing metrics.
Experts suggest the most effective ROI strategy is broad, moderate adoption rather than pushing heavy users to consume more tokens.

Source Read original →

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

The token bill comes due: Inside the industry scramble to manage AI’s runaway costs

The shift from capability to cost

The scale of the overspend

Productivity myths and data chaos

A fragmented market seeking order

Key takeaways

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Satya Nadella publicly torches…

New York lawmakers pass…

The U.S. Military Quietly…

The shift from capability to cost

The scale of the overspend

Productivity myths and data chaos

A fragmented market seeking order

Key takeaways

More in AI Tools & Reviews

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Satya Nadella publicly torches…

New York lawmakers pass…

The U.S. Military Quietly…