at what point does quantization stop being a tradeoff and start being actual quality loss

Reddit r/LocalLLaMA / 4/15/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The post discusses user experiences with local model quantization, noting that the quality impact can vary widely when moving between nearby quant levels (e.g., Q5 to Q4).
It raises the question of whether there is a general “cliff point” where quantization switches from acceptable tradeoff to clear coherence/quality loss.
The author suggests that the threshold may depend on factors such as model architecture and the type/length of generation being performed.
Readers are asked to share which quantization levels they use for everyday use versus when high quality is critical.

Been running a few models locally at different quant levels and honestly the jump from Q5 to Q4 sometimes feels like nothing and other times it completely tanks coherence on longer outputs. is there a general rule for where the cliff is, or does it just depend entirely on the model architecture and what you're doing with it. Would love to hear what quant levels people here actually settle on for daily use versus what they use when quality really matters

submitted by /u/srodland01
[link] [comments]