Blackwell and PDL performance increase

“`html

Llama.cpp recently added support for Programmatic Dependent Launch (PDL) in Nvidia GPUs, specifically targeting the Blackwell architecture. This feature allows more efficient execution of kernels and is expected to improve performance.

The PDL feature is not enabled by default; users need to build Llama.cpp with the ‘-D GGML_CUDA_PDL=ON‘ flag for it to work.
Initial benchmarks show a modest but noticeable improvement in token generation performance, ranging from 4% to 10%. For instance, on Qwen 3.6 35B.A3B MXFP4, there was a ~5% boost when PDL is enabled.
This feature can be toggled via the environment variable ‘export GGML_CUDA_PDL=0‘ to disable it if desired.

The introduction of PDL represents an incremental but significant performance enhancement, especially for models running on Blackwell GPUs. For users who are currently using Llama.cpp with these architectures, enabling PDL could result in noticeable improvements without requiring any additional hardware or software updates.

“`

Source Read original →

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Blackwell and PDL performance increase

Empowering Businesses with AI — Smart Tools, Smarter Business Decisions.

follow us

Popular Tag

Popular Post

How to Speed Up…

Alphabet plans to raise…

Nvidia chases $200B CPU…