“`html
Llama.cpp recently added support for Programmatic Dependent Launch (PDL) in Nvidia GPUs, specifically targeting the Blackwell architecture. This feature allows more efficient execution of kernels and is expected to improve performance.
- The PDL feature is not enabled by default; users need to build Llama.cpp with the ‘-D GGML_CUDA_PDL=ON‘ flag for it to work.
- Initial benchmarks show a modest but noticeable improvement in token generation performance, ranging from 4% to 10%. For instance, on Qwen 3.6 35B.A3B MXFP4, there was a ~5% boost when PDL is enabled.
- This feature can be toggled via the environment variable ‘export GGML_CUDA_PDL=0‘ to disable it if desired.
The introduction of PDL represents an incremental but significant performance enhancement, especially for models running on Blackwell GPUs. For users who are currently using Llama.cpp with these architectures, enabling PDL could result in noticeable improvements without requiring any additional hardware or software updates.
“`
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




