Qwen 3.6 q8 at 50t/s or q4 at 112 t/s?

Reddit r/LocalLLaMA / 4/18/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • The post asks how to decide between Qwen 3.6 quantized models running at different speeds (Q8 at 50 t/s versus Q4 at 112 t/s) for use in a local inference harness like pi.
  • The author reports that Q4 was extremely consistent and reliable in their testing, including running with a 131k context window and surviving two compacting steps on a well-defined task without breaking behavior.
  • They plan to test Q8 next and want others’ impressions on the expected qualitative differences between Q8 and Q4 in practice.
  • Overall, the discussion focuses on performance tradeoffs between higher-precision (Q8) and higher-throughput (Q4) settings for long-context, robustness-sensitive runs.

What are some ways that you would go about thinking about choosing between the two for use in a harness like pi?

Did a good bit with q4 yesterday and it was so consistent and reliable I had it set to 131k context and it worked through 2 compactings on a clearly defined task without messing the whole thing up. Very excited about this recent step forward.

I'm going to start working with the q8 some today but I was interested in what your impressions of the types of differences I might expect between the two.

submitted by /u/GotHereLateNameTaken
[link] [comments]