Now that MTP is merged... What's the best outputs you're getting on Qwen 3.6 35B on 2x3090s?

**What Happened:** A Reddit post titled “Now that MTP is merged… What’s the best outputs you’re getting on Qwen 3.6 35B on 2x3090s?” sparked discussion among users interested in the performance of Qwen, a large language model with 35 billion parameters running on two NVIDIA A100 (3090) GPUs. The post invited readers to share their experiences and best outputs achieved using this setup, specifically mentioning previous models like MTP that have been merged into a new framework.

**Why It Matters:** This discussion highlights the ongoing evolution of large language model performance, particularly in terms of throughput and consistency across different hardware configurations. Readers are seeking insights on how recent changes or optimizations might affect their interactions with Qwen, especially those running on high-performance GPU clusters. The post serves as an opportunity for users to share practical tips and improvements they’ve discovered since the merging of MTP into a new framework.

– **Qwen Performance Variations:** Users discuss variations in output quality and speed when using different layers or configurations within Qwen.
– **Hardware Optimization Insights:** There is interest in how best to leverage the capabilities of dual 3090 GPUs for maximum performance with Qwen.
– **Best Practices for Large Language Models:** The post encourages sharing of strategies for managing large models like Qwen, including techniques for optimizing memory usage and handling larger text outputs efficiently.

Originally published at reddit.com. Curated by AI Maestro.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.