I’m wondering if we take models of the same family (e.g qwen3.5 moes). And we compared ggufs that are of different core counts different quantizations but similar sizes.
Which model would be better for tasks? If it varies I’m mostly interested in coding and tool calling.
An example is qwen3.5 122b ud-iq2_xxs is 36.6gb and Qwen3.5 35b q8_0 is 36.9gb
Which would be better at coding/tool calling?
In spirit of the same question how interesting is it to run very large models like kimi 2.6 at 1bit precision vs smaller models at higher precisions.
[link] [comments]


