Now that MTP is merged... What's the best outputs you're getting on Qwen 3.6 35B on 2x3090s?

**What Happened:** A post was made on the British subreddit r/LocalLLaMA asking for feedback about the best outputs from Qwen 3.6, which is a variant of MPT-1B built using Alibaba Cloud’s infrastructure and running on two NVIDIA A100 GPUs (3090s). The discussion centered around how users were performing with this model after it had been merged into another system.

**Why It Matters:** This post underscores the ongoing development and testing of large language models like Qwen 3.6, which is a critical step for improving their utility in various applications such as text generation, question answering, and more. Users are eager to share their experiences and best practices, helping the community refine these powerful tools.

– **Improved Speed:** The user noted that switching from MTP (Model Training Pipeline) to a new version of MTP has led to faster output generation, specifically mentioning jumps in performance metrics like throughput per second (p/p) and tokens generated per second (t/g).

– **Best Practices Discussion:** This thread provides insights into how different users are configuring Qwen 3.6 for optimal performance, including strategies for managing resources and optimizing model configurations.

– **Community Collaboration:** By sharing their experiences and findings, the community can collectively improve the quality and efficiency of these models, which is crucial as they become more widely used in both research and commercial applications.

Originally published at reddit.com. Curated by AI Maestro.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.