Im curious how people are finding the quantisation effects of 35b. I recently updated to 48GB of vram so have jumped from ud-q4_k_xl to q8 and the difference feels stark. Just more effective tool calling, seems to get the vagueness and nuance more etc of some prompts., and provide more well rounded answers on some research like questions.
It was a quick vibe test, admittedly, but I'm going to try ud-q6_k_xl soon to see how of the 5+GB vram is worth the quality, but I'm curious to see others findings.
I felt with such a small active count it'd be particularly sensitive to quantisation, and feels that way after a play.
[link] [comments]




