MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)

Key Takeaways

The model has achieved impressive performance with a token throughput of 362.03 tokens per second (TPS).
The inference engine used is based on vllm fork v0.20.1 with ROCm7.2.1.
Results are for single inference with two prompts of 1k and 15k tokens, without using MTP or DFlash due to their limitations and the desire for full precision.
The model is fully usable with existing agentic harnesses like Claude Code, Hermes, etc., according to the author’s assessment.

There is still room for improvement by optimizing software and hardware stacks further. For instance, using a PCIe switch could potentially reduce latency, while more optimized DFlash/MTP implementations without additional overhead might be considered for future improvements.

For Makers and Artists:

The model’s performance highlights the potential of modern GPUs in running large-scale language models efficiently, which can benefit the development and execution of AI-powered creative tools.
Making use of this model could enable artists to incorporate more sophisticated and powerful text generation capabilities into their workflows without significant hardware upgrades.

As the field continues to evolve, we can expect even better performance from models like Qwen in terms of both speed and capability. This opens up exciting possibilities for integrating advanced AI tools directly into creative pipelines, enabling new forms of collaboration between humans and machines.

Source Read original →

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

MI50s Qwen 3.6 27B @52.8 tps TG @1569 tps PP (no MTP, no Quant)

Key Takeaways

For Makers and Artists:

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

How to Speed Up…

Alphabet plans to raise…

Nvidia chases $200B CPU…

Key Takeaways

For Makers and Artists:

More in AI News

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

How to Speed Up…

Alphabet plans to raise…

Nvidia chases $200B CPU…