Companies Are Making Claude and Codex Talk Like Cavemen to Stop AI’s Soaring Costs

Disclosure: Some links in this article are affiliate links. AI Maestro may earn a commission if you make a purchase, at no…

By AI Maestro June 30, 2026 4 min read
Companies Are Making Claude and Codex Talk Like Cavemen to Stop AI’s Soaring Costs

Companies are forcing their AI tools to speak like cavemen to stop burning through tokens and curb massive spending on artificial intelligence.

The tactic strips the usual verbosity from large language models like Claude Code, Codex, and Gemini. The goal is to replace pleasantries and hedging with blunt commands. Think less “you’re right to push back, I was wrong” and more “Hulk smash.”

The cost of chatter

This approach is a direct response to skyrocketing, unpredictable AI bills. As 404 Media previously reported, firms are scrambling to halt runaway spending, with consulting giant Accenture finding that much of the “soaring token spend” comes from users converting PDFs to presentations.

Developers at OpenAI, Nvidia, and GitHub now use the caveman plugin. A senior OpenAI employee has contributed code to the project, adding support for the Codex tool.

“I made Caveman back in early April because I was using Claude Code heavily and noticed a lot of my token spend was going to unnecessary prose: pleasantries, hedging, transitions, and chatty language that does not really matter inside an agent loop,” Julius Brussee, the creator of caveman, told 404 Media.

Corporate mandates

One company using the tool is electrical and digital infrastructure giant Legrand, which has entered the data center business. An internal Legrand memo shared with 404 Media tells employees, “since the billing system changed and the new quotas were implemented, we all need to be mindful of our usage of AI so we don’t use up our entire budget allowance too quickly.”

The memo lists four actions that will produce “high impact”: not always using the most powerful model; not always using high reasoning settings for the LLMs; using different, more appropriate models for different tasks; and finally, “use ‘caveman skill’ to reduce output consumption (without impacting code).”

Testing the plugin

In 404 Media’s tests of caveman with Claude Code, the plugin does make the LLM’s answers much more to the point. “Want changes to it?” the LLM asked after reviewing previously written code. “Uses official API, not scraping,” the LLM added, describing how the code worked.

When double-checked for the plugin, Claude outputted, “Already active. What you need?”

Caveman can also display the total number of tokens saved. In one test, the tool reported saving around 5,800 tokens, or 65 percent.

A screenshot of caveman in action.
A screenshot of caveman in action.

“It makes the model speak less like a polite chatbot and more like a terse tool,” Brussee said. “Same substance, fewer words. In my evals, Caveman cut output tokens by roughly 65–75 percent versus default verbose output, and still beat a normal ‘be concise’ instruction. That number varies by workflow, but the effect was clear.”

Users can pick their level of “grunt”: lite, full (the default setting), ultra, or Wenyan, which translates the output into classical Chinese characters.

“The goal was to reduce output tokens without touching the parts where exactness matters: code, commands, paths, URLs, numbers, function names, and technical details. Caveman mostly compresses the surrounding language,” Brussee added.

OpenAI involvement

Records on GitHub show that Shayne Sweeney, director of engineering at OpenAI, has contributed code to caveman. A commit a couple of months ago says, “Add Codex plugin support.”

A screenshot of caveman in action.
A screenshot of caveman in action.

Caveman also offers a whole agent that condenses everything down to caveman language. “caveman-code shrink everything — full terminal coding agent, caveman top to bottom. ~2× fewer tokens than Codex on identical tasks. 20+ providers · plan mode · autopilot goal loop · MIT,” caveman’s GitHub repository says.

The tool can also be used with OpenClaw, the agentic AI tool that went massively viral earlier this year.

Why it matters

The plugin is obviously pretty funny but comes in response to a very real problem. In April, GitHub announced it was going to start charging customers per token rather than a flat subscription fee. Uber capped employee’s use of AI tools and the company’s CTO says Uber blew through its entire AI budget in just four months. Walmart also capped AI tool usage. And now you have companies using caveman.

“I’ve heard from many individual developers and engineers inside companies using or testing it, including people at OpenAI, NVIDIA, GitHub, and DEPT,” Brussee said.

In leaked audio obtained by 404 Media, Accenture positioned itself as the cure to this problem, even though it encouraged clients to adopt AI as quickly as possible in the first place. In that audio a senior employee said Accenture had a new opportunity with its clients “to really think about token economics.”

Last year OpenAI CEO Sam Altman suggested that people extending their own pleasantries to LLMs, like “please” and “thank you,” cost OpenAI tens of millions of dollars in electricity costs.

Legrand, OpenAI, Nvidia, and GitHub did not respond to requests for comment on their caveman use.

Caveman’s GitHub repository says near the end: “Caveman save you token, save you money.”

Scroll to Top