I was experiencing issues with TG regression on both my main and non-main models until I tried the new b9254 build. After running this, TG has returned to normal alongside a 3% uplift in throughput with two RTX 5060Ti GPUs.
I ran the cmake command with the PDL flag to see if it would improve things further. I’m getting consistent results: 3k PP and 127 tg/s on my qwen3.6-35b-a3b-Q4_K_XL model, which is as good or better than b9202. More testing will be done to confirm.
Conversation
Key Takeaways
- The new build (b9254) has fixed my TG regression issues and improved performance by up to 3% on a pair of RTX 5060Ti GPUs.
- I tested the PDL flag with cmake, achieving consistent results of 3k PP/s and 127 tg/s using qwen3.6-35b-a3b-Q4_K_XL.
- Further testing without the PDL flag will be conducted to compare performance and ensure stability.
Source Read original →
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




