Let’s talk quants of Gemma and Qwen - 16 vs Q8 vs Q4 - any experiences?

Reddit r/LocalLLaMA / 5/20/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The post invites community members to share their experiences with quantized Gemma and Qwen models across different precision levels (Q16, Q8, and Q4).
  • It highlights a common debate in the local LLM community about how low quantization can go before quality degrades too much (e.g., “never below Q8” versus “Q3 is acceptable”).
  • Participants are prompted to provide personal takes rather than citing a single definitive benchmark or recommendation.
  • The discussion centers on practical trade-offs between model size/efficiency and output quality when running LLMs locally.

Some people say they’d never go under Q8, and others say they find Q3 acceptable! What’s your take?

submitted by /u/Borkato
[link] [comments]