“`html
I’m honestly so tired of the “bro u just need to prompt better” crowd when talking about coding agents churning out slop after 20 turns. I finally sat down and audited my API logs and prompt payloads this week because my token usage was off the charts, and I realized something that drove me absolutely crazy.
The models (even the big ones) aren’t degrading or getting lobotomized. They are literally just suffocating on their own bloated context windows before they even attempt to do any actual reasoning.
What these agents actually do under the hood:
- Blind exploration: They just recursively grep and dump like 40 different files into context just to find one stupid utility function. Half the time, it can’t even find my existing component so it just hallucinates a duplicate one from scratch lmao.
- Raw ingestion: Dumping a massive 2k line file into the prompt just to update a 5-line interface. Just why?
- Tool diarrhea: Verbose test logs and massive MCP tool definitions eating up like 30k tokens before the model even generates a single token of code.
- Goldfish memory: Every single session is groundhog day. Zero actual project awareness so it just re-reads the same exact files over and over.
If you look at what Cursor or Claude actually do under the hood on any decent-sized repo (like 10k+ lines), it’s a nightmare:
Key Takeaways
- The models are suffocating on their own bloated context windows before they even attempt to reason.
- The agents are fundamentally blind to how a codebase is actually structured until it burns all your tokens reading raw text.
- There’s a productivity paradox where we save an hour typing just to spend five hours fixing the architectural spaghetti it makes. We need an open-source agent that understands structure before wasting context windows on raw text.
Are anyone else in here working on fixing this locally? Are we really just accepting this weird productivity issue?
“`
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




