Let’s talk quants of Gemma and Qwen - 16 vs Q8 vs Q4 - any experiences?

Reddit r/LocalLLaMA / 5/20/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The post invites community members to share their experiences with quantized Gemma and Qwen models across different precision levels (Q16, Q8, and Q4).
It highlights a common debate in the local LLM community about how low quantization can go before quality degrades too much (e.g., “never below Q8” versus “Q3 is acceptable”).
Participants are prompted to provide personal takes rather than citing a single definitive benchmark or recommendation.
The discussion centers on practical trade-offs between model size/efficiency and output quality when running LLMs locally.

Some people say they’d never go under Q8, and others say they find Q3 acceptable! What’s your take?

AI Business

Dev.to

Dev.to

Dev.to

Reddit r/artificial