Been running a few models locally at different quant levels and honestly the jump from Q5 to Q4 sometimes feels like nothing and other times it completely tanks coherence on longer outputs. is there a general rule for where the cliff is, or does it just depend entirely on the model architecture and what you're doing with it. Would love to hear what quant levels people here actually settle on for daily use versus what they use when quality really matters
[link] [comments]




