| I’m running the 122-billion Qwen 3.5, specifically I’m (very!) impressed with the general knowledge output. I can talk to it in multiple languages, and don’t feel the need to consult online frontier models for any encyclopaedic, general "handyman" or other day-to-day questions. My local Qwen seems sufficient. This said, the output seems slow, around 19 tokens/s. Is this speed expected? I’m running the model from llama-server (latest compile as of yesterday), and the chat UI is Open WebUI. Are there any speed optimizations I can make in this setup without compromising the quality of output/
submitted by /u/breksyt |
Originally published at reddit.com. Curated by AI Maestro.
Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.




