Now that MTP is merged... What's the best outputs you're getting on Qwen 3.6 35B on 2x3090s?

**What Happened:** A post was made on the subreddit r/LocalLLaMA asking for feedback on the best outputs generated by Qwen 3.6, which has a base model size of 35B parameters and runs on two NVIDIA 3090 GPUs. The poster highlighted that they were using a CPU fallback mechanism to achieve output rates of around 3500 tokens per second (p/p) with 80 text generations per minute (t/g). They compared this to previous MTP versions, which had slower output speeds.

**Why It Matters:** This query is significant because it addresses the performance and capabilities of Qwen when running at its full model size of 35B parameters. The discussion around optimal outputs and speed improvements is crucial for developers and researchers who are evaluating different AI models for their applications. Understanding what works best with such a large model can help in tailoring these systems to specific use cases, whether it’s for generating text or other forms of processing.

**Takeaways:**
– **Performance Variability:** There is variability in output speeds depending on the MTP version and configuration used.
– **Model Size Impact:** Running at 35B parameters significantly boosts throughput but requires careful tuning to maintain optimal performance.
– **Community Feedback Loop:** Such posts serve as a platform for sharing insights, testing different settings, and iterating towards better configurations.

Source Read original →

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Now that MTP is merged… What’s the best outputs you’re getting on Qwen 3.6 35B on 2x3090s?

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

How to Fine-Tune LFM2…

Google Is Quietly Buying…

Microsoft’s new MAI models