![]() | Warning: long post ahead. On the bright side, it’s 100 percent human-written, typos and all. No AI slop was used to generate any of the following post. Bask in the warm glow of our increasingly rare shared humanity, gentle reader. I wanted to report my local model coding experience tonight. One of my board game hobby websites (static sites hosted on Github pages) had an annoying UI bug you can see in the Hardware: MacBook Pro M4 Max with 64Gb of RAM. Model backend: oMLX. Model: Gemma4-26B-A4B-it-oQ6. Agentic harness: Pi. This Gemma4-26B MoE model runs pretty fast on my machine: 800 tokens/second prompt processing, 63 tokens/second token generation. Qwen3.6-35B is my usual daily driver, and I have only used Gemma4 for chat purposes to date. But tonight I decided to test it for coding. I described the UI bug to Gemma4 verbally, and since it has Vision capabilities, I took a screenshot of the issue and uploaded it to the model for good measure. Things started out promising. Gemma4 analyzed the issue, figured it had the root cause, and started reading the site CSS file to insert a fix in the right spot. That’s when things started to go off the rails. Gemma4 fell into a recursive doom loop of read, edit, fail, then read again. Several times I stopped the model, told it that was looping, asked how I could help. Gemma4 apologized, acknowledged that it was looping, even appeared to identify why it was looping, said that it would try a different approach, then just fall into another loop. After about 15 minutes wasting my time trying to redirect Gemma4, I said Qwen3.6-27B-UD-MLX-8bit. That’s right — we’re going full-on 27 billion dense parameters on your ass, CSS bug. None of this puny MoE nonsense. Time to roll up our (virtual) sleeves and get down to business. To me, this is a clear real-world illustration and confirmation of certain assumptions I have made in my few months of exploring local models.
I started a new Pi session with qwen3.6-27B loaded up. Described the UI bug verbally. Didn’t even bother to upload a screenshot. That was enough for Qwen3.6-27B to understand the issue. Then it started THINKING. It chewed up about a quarter of my context window just figuring out the bug from all angles, paragraph upon paragraph of back and forth with itself. And after all that thinking, Qwen3.6-27B fixed the bug in a single-shot. As you can see in the To me, this is a clear real-world illustration and confirmation of certain assumptions I have made in my few months of exploring local models. TL;DR: Gemma4 MoE is fast but doom-loopy, while Qwen3.6 (dense) is slow but spot-on accurate. Edit: added the word
|
Key Takeaways
- Gemma4 MoE is fast but prone to looping and mistakes.
- Qwen3.6 (dense model) is slower but more accurate and precise for critical tasks like fixing UI bugs.
- The choice between MoE and dense models depends on the task at hand: Gemma4 for quick checks, Qwen3.6-27B for deep fixes.
Originally published at reddit.com. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.


