How bad is 1-bit quantization but on a big model?

Reddit r/LocalLLaMA / 3/11/2026

📰 NewsTools & Practical Usage

共有:

Key Points

The user is interested in running the Qwen3.5-397B-A17B model and noticed the IQ1_S and IQ1_M quantized versions are significantly smaller in size.
They are questioning the performance degradation or quality loss due to 1-bit quantization on such a large model compared to the original full-precision model.
They are also curious whether these 1-bit quantized models are comparable in effectiveness to smaller Qwen models like the 122B or 35B parameter versions.
The inquiry relates to practical considerations for deploying very large language models with quantization techniques that reduce model size and resource requirements.
This reflects ongoing community interest in balancing model size, quantization, and inference quality for large-scale AI models.

I'm planning on running Qwen3.5-397B-A17B then saw that the IQ1_S and IQ1_M have quite small size, how bad are they compared to the original and are they comparable to like Gwen3.5 122B or 35B?

submitted by /u/FusionBetween
[link] [comments]