minor speed bump for MTP with Qwen3.6-27B-MTP Q6_K_XL

I’m on Macbook M5 Max with 128GB RAM Running a test in openwebui using llama-server (llama.cpp): unsloth/Qwen3.6-27B-UD-Q6_K_XL.gguf (non MTP): 19tps unsloth/Qwen3.6-27B-UD-Q6_K_XL.gguf (MTP):…

By AI Maestro May 24, 2026 1 min read
minor speed bump for MTP with Qwen3.6-27B-MTP Q6_K_XL

I’m on Macbook M5 Max with 128GB RAM

Running a test in openwebui using llama-server (llama.cpp):

unsloth/Qwen3.6-27B-UD-Q6_K_XL.gguf (non MTP): 19tps
unsloth/Qwen3.6-27B-UD-Q6_K_XL.gguf (MTP): 22.3tps

So nothing like the massive improvements I hear about. Possibly my own settings though.

both use:

--temp 0.6 --top-p 0.8 --top-k 20 --min-p 0.00 --cache-ram 24576 --batch-size 4096 --ubatch-size 2048 

edit: forgot to add that I was using --spec-draft-n-max 2 have changed to 3 and also added —spec-draft-p-min 0.75 and now get 24.5tps (for gen)

edit2: I reran with a coding specific prompt and using different models. Acceptance rate is at ~95% for both MTP vers so can def tune more:

Qwen3.6-35B-A3B-UD-Q6_K (non-MTP): 83.82 tps
Qwen3.6-35B-A3B-UD-Q6_K_XL (MTP): 91.00 tps

Qwen3.6-27B-UD-Q6_K_XL (non-MTP): 17.44 tps
Qwen3.6-27B-UD-Q6_K_XL (MTP): 27.70 tps

submitted by /u/chimph
[link] [comments]


Originally published at reddit.com. Curated by AI Maestro.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top