minor speed bump for MTP with Qwen3.6-27B-MTP Q6_K_XL

I’m on Macbook M5 Max with 128GB RAM

Running a test in openwebui using llama-server (llama.cpp):

unsloth/Qwen3.6-27B-UD-Q6_K_XL.gguf (non MTP): 19tps
unsloth/Qwen3.6-27B-UD-Q6_K_XL.gguf (MTP): 22.3tps

So nothing like the massive improvements I hear about. Possibly my own settings though.

both use:

--temp 0.6 --top-p 0.8 --top-k 20 --min-p 0.00 --cache-ram 24576 --batch-size 4096 --ubatch-size 2048

edit: forgot to add that I was using --spec-draft-n-max 2 have changed to 3 and also added —spec-draft-p-min 0.75 and now get 24.5tps (for gen)

edit2: I reran with a coding specific prompt and using different models. Acceptance rate is at ~95% for both MTP vers so can def tune more:

Qwen3.6-35B-A3B-UD-Q6_K (non-MTP): 83.82 tps
Qwen3.6-35B-A3B-UD-Q6_K_XL (MTP): 91.00 tps

Qwen3.6-27B-UD-Q6_K_XL (non-MTP): 17.44 tps
Qwen3.6-27B-UD-Q6_K_XL (MTP): 27.70 tps

submitted by /u/chimph

Empowering Businesses with AI: Smart Tools, Smarter Business Decisions.