At wits end for optimizing settings in llama.cpp for 100k context

“`html A Reddit user is experiencing difficulties optimizing llama.cpp settings for their specific hardware, particularly with Qwen3.5-35B-A3B (GGUF format) running on MacOS.…

By AI Maestro May 20, 2026 1 min read
At wits end for optimizing settings in llama.cpp for 100k context

“`html

  • A Reddit user is experiencing difficulties optimizing llama.cpp settings for their specific hardware, particularly with Qwen3.5-35B-A3B (GGUF format) running on MacOS.
  • The user aims to achieve higher token processing rates but finds themselves spending more time fine-tuning settings than actually running inference tests.
  • This issue arises because while tools like llama-bench can theoretically find optimal settings, they may not cover all the nuances of their specific environment or test all available flags efficiently.

“`


Originally published at reddit.com. Curated by AI Maestro.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.

Name
Scroll to Top