Now that MTP is merged... What's the best outputs you're getting on Qwen 3.6 35B on 2x3090s?

**What Happened:** A thread on Reddit titled “Now that MTP is merged… What’s the best outputs you’re getting on Qwen 3.6 35B on 2x3090s?” has sparked discussion among users who are leveraging the new MTP merge for their models running on dual NVIDIA 3090 GPUs. The thread highlights that some users previously achieved impressive performance with a 1500 prompt-per-second (p/p) and 120 tokens-per-game (t/g) rate, but this has slowed to around an 80 t/g rate when using MTP. One user is sticking with their CPU fallback for now at 3500 p/p and 80 t/g until they find a better solution.

**Why It Matters:** This discussion underscores the ongoing experimentation and optimization efforts within the AI community, particularly concerning how different models perform on specific hardware configurations. The merging of MTP (Model Training Pipeline) has likely introduced changes that affect performance metrics, prompting users to reassess their setups. For those running large-scale models like Qwen 3.6 on high-performance GPUs, understanding these nuances is crucial for maintaining optimal performance and efficiency.

– **Users are seeking insights into new model outputs after the MTP merge.**
– **Performance has varied post-merge, with some users experiencing a reduction in token generation rate.**
– **There’s an ongoing need to fine-tune configurations or find alternative solutions to achieve desired output rates.**

Originally published at reddit.com. Curated by AI Maestro.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.