Anthropic Claude Sonnet 5 vs Sonnet 4.6 vs Opus 4.8: Agentic Coding Benchmarks, API Pricing, and Cost-Performance Tradeoffs Compared

Disclosure: Some links in this article are affiliate links. AI Maestro may earn a commission if you make a purchase, at no…

By AI Maestro June 30, 2026 3 min read
Anthropic Claude Sonnet 5 vs Sonnet 4.6 vs Opus 4.8: Agentic Coding Benchmarks, API Pricing, and Cost-Performance Tradeoffs Compared

Anthropic has released Claude Sonnet 5, positioning it as the company’s most capable agentic model in the mid-tier. It is now the default for Free and Pro plans, while Max, Team, and Enterprise users can select it directly. The model is live in Claude Code and on the Claude Platform.

Key takeaways

  • Sonnet 5 is the most agentic mid-tier model, closing much of the gap to the flagship Opus 4.8.
  • It outperforms Sonnet 4.6 on every benchmark: 63.2% on SWE-bench Pro, 81.2% on OSWorld-Verified, and 57.4% on HLE.
  • Introductory pricing is $2 for input and $10 for output per million tokens through August 31, rising to $3/$15. Opus 4.8 costs $5/$25.
  • It offers the best value for low and medium effort tasks. At xhigh effort, costs can exceed Opus 4.8 for similar quality.
  • The model is safer than 4.6 with deliberately reduced cyber capabilities, though Opus 4.8 remains the choice for accuracy-critical work.

Model positioning

Sonnet sits between the cheaper Haiku 4.5 and the flagship Opus 4.8. Sonnet 5 updates the February 2026 release of Sonnet 4.6. Anthropic frames this update around agentic reliability rather than a single headline metric.

In practice, this means longer task chains without losing context. It also means better self-correction when a tool call fails and steadier behavior across extended sessions inside Claude Code or Cowork.

The model exposes four effort levels: low, medium, high, and xhigh. Higher effort spends more tokens on reasoning, raising both quality and cost.

It is important to note that Sonnet 5 uses an updated tokenizer, the same one introduced with Opus 4.7. The same text can map to roughly 1.0 to 1.35 times more tokens.

Performance benchmarks

The Anthropic team published a table comparing Sonnet 5, Sonnet 4.6, and Opus 4.8. Sonnet 5 beats its predecessor in every tested category and closes much of the gap to Opus 4.8.

On agentic coding (SWE-bench Pro), Sonnet 5 scores 63.2%. Sonnet 4.6 scored 58.1%. Opus 4.8 still leads at 69.2%.

On computer use (OSWorld-Verified), Sonnet 5 posts 81.2% against Sonnet 4.6’s 78.5%. On Terminal-Bench 2.1, it reaches 80.4% versus 67.0%.

On Humanity’s Last Exam with tools, Sonnet 5 hits 57.4%. That nearly matches Opus 4.8 at 57.9%.

There is one place where Sonnet 5 edges ahead. On the GDPval-AA v2 knowledge-work benchmark, it scores 1,618 against Opus 4.8’s 1,615.

Effort levels and cost

The cost-performance story is the most important part for developers. Sonnet 5 is a strict improvement over Sonnet 4.6 across every effort level. The clearest value appears at low and medium effort.

At those levels, Sonnet 5 delivers quality that earlier Sonnet pricing could not buy. Opus 4.8 remains the accuracy leader at the top of the range.

A practical routing policy follows from this. Send most agentic coding, tool use, and knowledge work to Sonnet 5. Reserve Opus 4.8 for accuracy-critical tasks. Keep Haiku 4.5 for high-volume, latency-sensitive calls.

Use cases

Early access partners described concrete workflows. Their reports map to common engineering jobs.

  • Multi-step software engineering: One tester asked Sonnet 5 to investigate a bug. It wrote a reproducing test, implemented the fix, then confirmed the bug returned without the change. It did this in a single pass.
  • Brownfield debugging: Another partner ran it on hard pull requests. The model traced failures to their root causes. It shipped durable fixes rather than symptom patches.
  • Business automation: Zapier handed it a two-part job. It updated Salesforce account tiers, then sent a launch email to enterprise contacts. It finished the task end to end.
  • Computer-use agents: Pace runs insurance workflows like submission intake and loss runs. Its agents act on the operational systems teams already use.
  • Data exploration: ClickHouse agents query live data and produce insights on the fly. Faster reasoning means faster time-to-insight for analysts.

Comparison table

Metric / SpecSonnet 4.6Sonnet 5Opus 4.8
Agentic coding (SWE-bench Pro)58.1%63.2%69.2%
Terminal-Bench 2.167.0%80.4%not reported
Computer use (OSWorld-Verified)78.5%81.2%not reported
Humanity’s Last Exam (with tools)46.8%57.4%57.9%
Knowledge work (GDPval-AA v2)not reported1,6181,615
Input price ($/MTok)32 intro, then 35
Output price ($/MTok)1510 intro, then 1525

Sonnet 5’s introductory pricing runs through August 31, 2026. Standard pricing of $3/$15 begins after that date. Standard prompt caching (cache reads at 0.1x input) and the 50% Batch API discount also apply. Per token, Sonnet 5 undercuts GPT-5.5 and Gemini 3.1 Pro, but costs more than Gemini 3.5 Flash. Anthropic lists a 1M-token context window for Sonnet 5 in its launch post. It does not publish context figures for the other models here.

How to call the API

The API call mirrors any other Anthropic model. You change the model string to claude-sonnet-5.

import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY

message = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Find the race condition in worker.py and ship a tested fix."}
    ],
)

print(message.content[0].text)
Scroll to Top