I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how

I was frustrated that every coding agent (OpenCode, Cursor, Claude Code) assumes you’re running GPT-5.4 or Claude Opus. If you try them with a local model like Gemma or Qwen they fall apart. I find that often tool calls fail, context overflows, multi-step tasks collapse.

So I built SmallCode. It’s designed from the ground up for small local models.

The result: 87/100 benchmark tasks pass with a Gemma 4 model that only activates 4B parameters per token. OpenCode scores ~75% with 14B models. The harness does the heavy lifting, not the model size.

How it works (the tricks that make small models reliable):

Compound tools: Instead of making the model chain 4 tool calls (find file → read file → edit file → verify), SmallCode gives it one tool that does all 4. Small models lose coherence after 3+ sequential calls. This cuts failures in half.
Improvement loop: Every time the model writes code, SmallCode instantly compiles/lints it. If it fails, it feeds the errors back automatically. The model doesn’t need to be smart enough to get it right first try, it just needs to fix errors when shown them.
Decompose on failure: If the model fails the same thing twice, SmallCode stops retrying and instead breaks the problem into smaller pieces. "Fix this 200-line file" becomes "fix line 45 only."
Escalation: If even decompose fails and you have a Claude/OpenAI key configured, it auto-escalates to the bigger model for just that one task. You stay local 95% of the time, cloud 5%.
Token budgeting: Small models have 32k-256k context. SmallCode never dumps a whole file in. It summarizes, truncates, and manages every token so the model never sees "…" truncation in the middle of important code.
Code graph: Instead of grep-searching your codebase, SmallCode indexes your code into a symbol graph (functions, classes, who-calls-what). When you ask "how does auth work," it walks the graph and returns just the relevant connected code, not 15 random file snippets.

What it looks like:

Full-screen terminal UI (like OpenCode/vim), scrollable chat, command palette with /, plugin system, persistent memory across sessions.

What it doesn’t do:

No LSP integration (yet)
No multi-session (yet)
No desktop app
Doesn’t compete with Claude Code for frontier model users

Install:

npm install -g smallcode cd your-project smallcode

Point it at LM Studio, Ollama, or any OpenAI-compatible endpoint.

MIT licensed, everything’s on GitHub: https://github.com/Doorman11991/smallcode

Happy to answer questions about the architecture or benchmark methodology.

submitted by /u/Glittering_Focus1538

Source Read original →

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here’s how

Empowering Businesses with AI: Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Some of the nation’s…

Meituan Releases LongCat-2.0: A…

Amazon will stop accepting…