Consider running a bigger quant if possible

Reddit r/LocalLLaMA / 4/22/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical Usage

共有:

Key Points

The author suggests that if hardware allows, running a larger quantized model can significantly improve real-world behavior compared with smaller quants.
They report that Qwen 3.6 IQ4_XS at 128k context underperformed badly due to looping, formatting mistakes, and incorrect implementations.
After switching to the unsloth IQ4_NL_XL (with some VRAM headroom), they found it worked much better for agentic coding tasks.
They advise not to judge models purely by tok/s or VRAM fit; instead, measure end-to-end task time and expect that a slightly slower model that finishes correctly can be faster overall.

Just a little reminder that *if* it is possible for you to run bigger quants, do it. I ran Qwen 3.6 IQ4_XS at 128k context was very much disappointed because it would loop, make formatting errors, implement wrong things etc. I had a little bit of headroom and decided to give the new unsloth IQ4_NL_XL a try and what should I say. It works MUCH better for agentic coding. If you are like me and start conservative with your model selection based on what completely fits into vram, it might worsen your experience to a very big degree. Always look out for how long the processing of a task really takes and ignore tok/s for quant comparisons. You get stuff faster done if the slower tok/s model (even with offload) takes less time to complete queries correctly(duh)

submitted by /u/Flashy_Management962
[link] [comments]

Black Hat USA

AI Business

Autoencoders and Representation Learning in Vision

Dev.to

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Dev.to

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Dev.to

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Dev.to

Consider running a bigger quant if possible

Key Points

Related Articles

Black Hat USA

Autoencoders and Representation Learning in Vision

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer