I’m on Macbook M5 Max with 128GB RAM
Running a test in openwebui using llama-server (llama.cpp):
unsloth/Qwen3.6-27B-UD-Q6_K_XL.gguf (non MTP): 19tps
unsloth/Qwen3.6-27B-UD-Q6_K_XL.gguf (MTP): 22.3tps
So nothing like the massive improvements I hear about. Possibly my own settings though.
both use:
--temp 0.6 --top-p 0.8 --top-k 20 --min-p 0.00 --cache-ram 24576 --batch-size 4096 --ubatch-size 2048 edit: forgot to add that I was using --spec-draft-n-max 2 have changed to 3 and also added —spec-draft-p-min 0.75 and now get 24.5tps (for gen)
edit2: I reran with a coding specific prompt and using different models. Acceptance rate is at ~95% for both MTP vers so can def tune more:
Qwen3.6-35B-A3B-UD-Q6_K (non-MTP): 83.82 tps
Qwen3.6-35B-A3B-UD-Q6_K_XL (MTP): 91.00 tps
Qwen3.6-27B-UD-Q6_K_XL (non-MTP): 17.44 tps
Qwen3.6-27B-UD-Q6_K_XL (MTP): 27.70 tps
submitted by /u/chimph
[link] [comments]
Originally published at reddit.com. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




