“`html
A new best recipe for the Qwen model has been shared by a user on Reddit. The updated version, referred to as "autorund-best", uses more iterations to improve quality and performance on an RTX 5090 VLLM environment.
- The new recipe is available via Hugging Face under the names
webhie/Qwen3.6-27B-int4-AutoRoundfor the model andwebhie/Qwen3.6-27B-int4-AutoRound-Codefor the calibration dataset. - The token generation rate is 130-160 tps (without mtp) and 290-320 tps (with mtp 3).
- To address any issues with other Qwen models, users are advised to try v11 from the provided link: froggeric/Qwen-Fixed-Chat-Templates.
“`
Source Read original →
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




