The use Q8 a waste of resources?

Reddit r/LocalLLaMA / 5/3/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical Usage

Key Points

  • The post asks whether running a high-quantized LLM (Q8) is an inefficient use of disk space and VRAM compared with using lower-quantized variants like Q6 K.
  • The author compares models in terms of supported context length (e.g., 75k vs 145k) and expected performance (tokens per second), weighing the tradeoffs against hardware cost.
  • Key concerns include whether moving from Q8 to Q6 would significantly reduce model intelligence or overall capability.
  • The author also asks about the impact of quantization on vision capabilities and whether “Q6 K XL” offers a meaningful improvement over “Q6 K.”

I can run G4 31B Q8 XL with ctx 75k and Gwen's 27B and 35B Q8 XL ctx 145k, but I'm wondering if I'm wasting GB of SSD and VRAM.

Is it worth upgrading to Q6 K? To save disk space and gain a little more T/s and more context? Or does intelligence deteriorate significaly "Kld" or "kl"?

Is Vision affected by using Q6?

Q6 K XL is much better than "Q6 K" normal?

submitted by /u/Spiderboyz1
[link] [comments]