Claude Sonnet 5 costs more per task than Anthropic’s previous top model, despite keeping token rates flat and beating the pricier Opus 4.8 on specific agent-based tests.
In this article
Artificial Analysis evaluated the model before release and placed it fifth in its Intelligence Index. Sonnet 5 scored 53 points at peak performance, matching GPT-5.5 (high) for fifth place. Four models rank higher: GPT-5.5 (xhigh) at 55, Opus 4.7 at 54, Opus 4.8 at 56, and Claude Fable 5, which became generally available today, at 60 points.
That represents a six-point jump over Sonnet 4.6, which scored 47 points. However, Sonnet 5 consumes far more tokens to achieve these scores.
Same token prices, double the real cost
On paper, Sonnet 5 retains the same token prices as its predecessor: $3 per million input tokens and $15 per million output tokens. Opus 4.8 sits at $5 and $25. Yet according to Artificial Analysis, an average task in the Intelligence Index costs $2.29 with Sonnet 5, versus about $1.97 with Opus 4.8.
At the maximum performance setting, Sonnet 5 burns through about 40 percent more output tokens per task than Sonnet 4.6. In agent-based knowledge work benchmarks like AA-Briefcase and GDPval-AA, it runs about three times as many agent loops as its predecessor. Sonnet 4.6 cost about $1.20 per task. That is nearly doubled, even though Sonnet 5 beats Opus 4.8 on some of these tasks.
Anthropic is running a promotional rate of $2 or $10 per million tokens through September 1, but Artificial Analysis based its results on regular prices.
Complex reasoning still exposes Sonnet 5’s limits
Sonnet 5 still falls short of larger models on reasoning- and knowledge-heavy benchmarks. On CritPt, a frontier physics reasoning test from Argonne National Labs and the University of Illinois, it scored 17 percent. That is 14 points above its predecessor but below GLM-5.2, Claude Opus, Fable, and GPT-5.5 in their higher configurations.
Elsewhere, Sonnet 5 shows solid gains over Sonnet 4.6: a 9-point jump on Terminal-Bench v2.1, 10 points on Humanity’s Last Exam, and 7 points on SciCode. Scores on the remaining evaluations stayed roughly flat.
Anthropic keeps raising prices without saying so
Anthropic has done this before. When Opus 4.7 launched, token prices stayed flat on paper, but a new tokenizer chopped the same text into approximately 30 percent more tokens, inflating the real bill. Developer Abhishek Ray measured a 1.325x to 1.47x increase, and a community analysis of over 483 submissions found a 37.4 percent jump in tokens per request. With Sonnet 5, the tokenizer issue is compounded by the model’s more agentic behavior, which eats through far more tokens per task.
Anthropic’s models keep getting pricier with each generation, sometimes dramatically so, yet the official price lists do not reflect it. That kind of hidden cost creep is a hard sell when Chinese competitors like Deepseek V4 Pro and GLM-5.2 offer competitive performance at a fraction of the cost in the mid-range segment where Sonnet sits.
AI providers need more transparent pricing, like cost per standardized task or real-world knowledge work job, rather than raw token prices that lose meaning.
What it means
Developers using Claude for complex workflows should expect bills to rise even if the published token rates do not change. The combination of higher consumption and tokenizer shifts means the effective cost per task has nearly doubled since Sonnet 4.6.




