Build 9254 fixes my TG regression and adds PDL for NVIDIA GPUs

I was experiencing issues with TG regression on both my main and non-main models until I tried the new b9254 build. After running this, TG has returned to normal alongside a 3% uplift in throughput with two RTX 5060Ti GPUs.

I ran the cmake command with the PDL flag to see if it would improve things further. I’m getting consistent results: 3k PP and 127 tg/s on my qwen3.6-35b-a3b-Q4_K_XL model, which is as good or better than b9202. More testing will be done to confirm.

Conversation

Key Takeaways

The new build (b9254) has fixed my TG regression issues and improved performance by up to 3% on a pair of RTX 5060Ti GPUs.
I tested the PDL flag with cmake, achieving consistent results of 3k PP/s and 127 tg/s using qwen3.6-35b-a3b-Q4_K_XL.
Further testing without the PDL flag will be conducted to compare performance and ensure stability.

Source Read original →