The use Q8 a waste of resources?

Reddit r/LocalLLaMA / 5/3/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The post asks whether running a high-quantized LLM (Q8) is an inefficient use of disk space and VRAM compared with using lower-quantized variants like Q6 K.
The author compares models in terms of supported context length (e.g., 75k vs 145k) and expected performance (tokens per second), weighing the tradeoffs against hardware cost.
Key concerns include whether moving from Q8 to Q6 would significantly reduce model intelligence or overall capability.
The author also asks about the impact of quantization on vision capabilities and whether “Q6 K XL” offers a meaningful improvement over “Q6 K.”

I can run G4 31B Q8 XL with ctx 75k and Gwen's 27B and 35B Q8 XL ctx 145k, but I'm wondering if I'm wasting GB of SSD and VRAM.

Is it worth upgrading to Q6 K? To save disk space and gain a little more T/s and more context? Or does intelligence deteriorate significaly "Kld" or "kl"?

Is Vision affected by using Q6?

Q6 K XL is much better than "Q6 K" normal?

submitted by /u/Spiderboyz1
[link] [comments]

Black Hat USA

AI Business

Building a Shopify app with Claude Code — spec-driven development and pricing design

Dev.to

The AI Habit That Pays Dividends (And Takes Zero Extra Time)

Dev.to

From Chaos to Clarity: AI-Powered Client Portals for Designers

Dev.to

Stuck in the Mud (and Loops!) - Kiwi-chan Devlog #7

Dev.to

The use Q8 a waste of resources?

Key Points

Related Articles

Black Hat USA

Building a Shopify app with Claude Code — spec-driven development and pricing design

The AI Habit That Pays Dividends (And Takes Zero Extra Time)

From Chaos to Clarity: AI-Powered Client Portals for Designers

Stuck in the Mud (and Loops!) - Kiwi-chan Devlog #7

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer