“`html
The Pac-Man Benchmark: Finally a Viable Local Agent with Qwen 3.6 27b
I like to test new models by one-shotting (with a good prompt) a single webpage clone of the classic arcade game Pac-Man. So far, all of them— including Anthropic, ChatGPT, and Google models —have failed. Most have miserably so. The best until now was GLM 5.1.
That changed when I tried it with Qwen 3.6 27b F16. Out of the three attempts, two were by far the best, with only minor errors! However, as soon as I dropped to 8-bit quantization, I could not replicate those good results even after trying five plus times. This shows what I had been saying for a long time: there is a world of difference between a 16-bit and an 8-bit quant, despite most people claiming it is lossless or nearly so.
The results were so good that since I was testing the LLama.cpp MTP speculative decoding PR (not yet merged at that time) with my own quants, and developing my own fixed jinja chat template for Qwen 3.5/3.6, I thought it would be fun to push Qwen 3.6 27b F16 through a proper agentic coding workflow. The results were brilliant and speak for themselves. You can try the full single-page game here.
Lessons Learned and Observations:
- A good chat template is critical. The official chat template was unusable due to it being only targeted at vLLM, and therefore full of errors in other tools. I started with community templates, which were improvements, but still had many quirks. This is why I started fixing the bugs one by one in the official templates, and slowly improving it. The beginning of the agentic sessions were painful due to many quirks and errors. But once I got the template well tuned, it felt like I had unlocked a new level of intelligence in the model.
- MTP speculative decoding does not accelerate all tasks identically. Basically it is most efficient at deterministic tasks like coding, and least so at creative tasks like brainstorming. For this Pac-Man development, my generative tok/s varied between 8 tok/s and 18 tok/s depending on the task. For reference, without MTP, I get 6.6 tok/s with the same model and quant.
- Not all harnesses are equals both in terms of code quality but also in terms of impact on speed. Most people already know that the coding harness has a huge impact on quality; Claude Code is considered the gold standard. This is what I use for normal daily coding. In this case, I started with Qwen CLI, mostly because of the chat template problems, on the principle that if there was one harness more likely to better handle Qwen LLM specifics, it would be their own harness. I was actually pleasantly surprised; Qwen CLI delivered far beyond what I was expecting! In the later stages, I switched back to Claude Code, mostly to verify that the final chat template was working properly there too. What I noticed though is that developing in Claude Code was a lot slower than in Qwen CLI. This is due to all the extra prompts built within Claude Code.
- Context management and caching is super efficient in this model. Do not interfere with it. It works great, let it do its thing. Do not use any skill, plugin, etc., that manipulates the cache or context. This will result in confusing the model and making it a lot dumber and error-prone.
- Tool calls, context compaction, shell usage, subagents, parallel subagents, work flawlessly. Initially it did not, and it took me a long time and lots of work to get it right through chat template fixes and improvements. I actually only used context compaction for testing, and it was fine, as usual in Claude Code.
- Apart from Gemini, this is the first model that impressed me with its audio knowledge. As a composer, musician, psychoacoustic scientist, and audio engineer, I pay a lot of attention to good audio. In this case, I tasked it to do some advanced audio manipulation and creation. All the audio in the game comes from Qwen having programmed the web audio synthesizer in a highly advanced and complex way. This is not MIDI, not simple wavetables, not samples; it takes into account psychoacoustic properties tuned to human hearing with the use of harmonics, distortion, layers, various effects. Truly impressive work. The only exception is the waka-waka sound, for which I had to make it use a sample (the same method was used in the original arcade game).
- I can live with slow token generation speed. I used to think that I needed at least 70-80 tok/s for viable development. But this was usable; gave me time to do other things in parallel, and also allowed better reflection on the agentic tasks. I would probably not use it for large projects, with my current hardware, but for small to medium-sized projects, it is definitely acceptable.
If you read until here, let me know what you think, and I hope you enjoy the game.
Dev environment: macOS, Apple Silicon M2 Max, 96GB RAM, LLama.cpp server with OpenAI and Anthropic API endpoints.
“`
Originally published at reddit.com. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




