Higher precision or higher parameter count

Reddit r/LocalLLaMA / 4/26/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The post asks whether, for models from the same family, higher precision or a higher parameter count (with different quantization/Core counts) will generally perform better for real tasks.
  • It cites a concrete storage-size comparison between Qwen3.5 MOE (122B UD-iQ2) and a denser Qwen3.5 35B (Q8_0), asking which would be better specifically for coding and tool calling.
  • The author is also curious about the practical trade-off of running very large models (e.g., Kimi 2.6) at extremely low precision like 1-bit versus using smaller models at higher precision.
  • Overall, the request focuses on guidance for local/offline deployment choices under memory constraints rather than on announcing any new model or feature.

I’m wondering if we take models of the same family (e.g qwen3.5 moes). And we compared ggufs that are of different core counts different quantizations but similar sizes.

Which model would be better for tasks? If it varies I’m mostly interested in coding and tool calling.

An example is qwen3.5 122b ud-iq2_xxs is 36.6gb and Qwen3.5 35b q8_0 is 36.9gb

Which would be better at coding/tool calling?

In spirit of the same question how interesting is it to run very large models like kimi 2.6 at 1bit precision vs smaller models at higher precisions.

submitted by /u/redblood252
[link] [comments]