Better Models: Worse Tools

Armin Ronacher reports that newer Anthropic models like Opus 4.8 and Sonnet 5 frequently call Pi’s edit tool with invented fields in the nested edits array. The edit content is usually correct but the arguments do not match the schema as the model invents made-up keys and Pi thus rejects the tool call and asks to try again. This is not a new issue as models emit malformed tool calls sometimes. What surprised the developer is that this is getting worse with newer Anthropic models as both Opus 4.8 and Sonnet 5 show it but none of the older models. In other words, the SOTA models of the family are worse at this specific tool schema than their older siblings.

Ronacher theorises that this is because more recent Anthropic models have been specifically trained via Reinforcement Learning to better use the edit tools that are baked into Claude Code. This has the unfortunate effect that other coding harnesses, such as Pi, may find that their own custom edit tools are more likely to be used incorrectly. The situation highlights a divergence in tool training strategies where Anthropic uses search and replace while OpenAI Codex uses an apply patch mechanism. Developers building third-party coding harnesses now face a dilemma regarding how to implement edit tools to accommodate different model behaviours.

* Opus 4.8 and Sonnet 5 exhibit the regression while older models do not.
* The issue stems from RL training focused on Claude Code’s specific tool.
* OpenAI models are trained differently using an apply patch mechanism.

Source Read original →

Better Models: Worse Tools

Empowering Businesses with AI: Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Baidu’s “Unlimited OCR” processes…

Claude Code and Fable…

Infuriating Google commercial imagines…