Now that MTP is merged... What's the best outputs you're getting on Qwen 3.6 35B on 2x3090s?

**What Happened:** A user on Reddit, `/u/youcloudsofdoom`, posted a query asking about the best outputs from Qwen 3.6 (35B) running on dual NVIDIA 3090 GPUs. They mentioned previously achieving impressive results with MTP for their model but noted a recent slowdown to 80 t/g when testing, which they preferred over the previous CPU fallback of 3500 p/p and 80 t/g.

**Why It Matters:** This post highlights ongoing experimentation around optimizing large language models like Qwen 3.6 on specific hardware configurations. Users are seeking insights into what constitutes the best performance for this particular model setup, which is crucial for those looking to maximize efficiency and output in their AI applications. The discussion also underscores the importance of testing different architectures and parameters to find the optimal configuration.

– Users are eager to share and learn about new findings or improvements.
– There’s a need for robust benchmarks across various hardware setups to ensure consistent performance.
– Collaborative efforts among developers can lead to better model tuning and deployment strategies.

Source Read original →

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Now that MTP is merged… What’s the best outputs you’re getting on Qwen 3.6 35B on 2x3090s?

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

Suno has raised over…

Build 2026: Microsoft tops…

Nous Research releases Hermes…