Quantisation effects of Qwen3.6 35b a3b

Reddit r/LocalLLaMA / 4/25/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

共有:

Key Points

A user on Reddit shares early, informal observations that upgrading VRAM and switching from UD-Q4_K_XL to Q8 on Qwen3.6 35B makes the model noticeably better, especially for tool calling and capturing nuance.
The user reports that the higher quantization/precision also leads to more well-rounded answers, including for research-style questions, based on quick prompt testing.
They plan to test UD-Q6_K_XL next to evaluate whether the additional VRAM cost (5+GB) is justified by improved quality.
The user speculates that with a low active token count, the model may be more sensitive to quantization effects, aligning with what they experienced during their play test.
Overall, the post solicits other community members’ findings about how Qwen3.6 35B quality changes across quantization levels.

Im curious how people are finding the quantisation effects of 35b. I recently updated to 48GB of vram so have jumped from ud-q4_k_xl to q8 and the difference feels stark. Just more effective tool calling, seems to get the vagueness and nuance more etc of some prompts., and provide more well rounded answers on some research like questions.

It was a quick vibe test, admittedly, but I'm going to try ud-q6_k_xl soon to see how of the 5+GB vram is worth the quality, but I'm curious to see others findings.

I felt with such a small active count it'd be particularly sensitive to quantisation, and feels that way after a play.

submitted by /u/ROS_SDN
[link] [comments]