Developers who use local AI - Q4_0 vs Q8_0 KV quant?

“`html

A developer seeking feedback from other developers who use large context models like Qwen, asking if they notice any differences between using a smaller (4k) and larger (8k) key-value cache size in their applications.
This query is particularly relevant as the developer wants to reduce the VRAM requirement for these models but is concerned about potential quality degradation when operating within larger context windows. They are testing with two different model variants: a smaller, dense Qwen 3.6 and a larger MoE variant of the same model.

“`

Originally published at reddit.com. Curated by AI Maestro.

Stay ahead of AI. Get the most important stories delivered to your inbox — no spam, no noise.