At wits end for optimizing settings in llama.cpp for 100k context

“`html

A Reddit user is experiencing difficulties optimizing llama.cpp settings for their specific hardware, particularly with Qwen3.5-35B-A3B (GGUF format) running on MacOS.
The user aims to achieve higher token processing rates but finds themselves spending more time fine-tuning settings than actually running inference tests.
This issue arises because while tools like llama-bench can theoretically find optimal settings, they may not cover all the nuances of their specific environment or test all available flags efficiently.

“`