“`html
A British AI enthusiast, IvGranite, shared notes from a test run with ROCm and Memory Transfer Pool (MTP) optimizations on the LLaMA model. The tests covered various models and backends.
- The main findings show significant drops in token production at full context for both ROCm and Vulkan backends. However, MTP helped recover some of these losses for the larger models like 122B MoE.
- Vulkan maintained stability with only a 12% drop in token production at full context compared to the 64% decrease seen on ROCm.
- MTP showed mixed effects: it helped slightly with the Vulkan backend but caused a substantial drop (38%) for the MTP-enabled 122B model running on ROCm.
This information is crucial as it provides insights into how different optimizations affect token production and overall performance, especially at full context where models are expected to operate most efficiently. It also highlights the importance of choosing the right backend based on specific use cases and model sizes.
“`
### Takeaways
– **ROCm vs Vulkan:** ROCm showed a 64% drop in token production compared to Vulkan for larger models, while Vulkan maintained stability.
– **MTP Effectiveness:** MTP had mixed results; it helped slightly with Vulkan but caused significant drops on ROCm.
– **Model Size Sensitivity:** The impact of optimizations varies significantly depending on the model size and backend used. Larger models like 122B were more sensitive to these changes.
Originally published at reddit.com. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




