Agentic coding Qwen 3.6, Q6_K 125k context vs Q5_K_XL 200k context

Reddit r/LocalLLaMA / 4/18/2026

💬 OpinionSignals & Early TrendsTools & Practical Usage

Key Points

  • The article is a Reddit-style discussion asking which Qwen 3.6 coding setup to choose: Q6_K with 125k context versus Q5_K_XL with 200k context.
  • The author questions whether a 125k context window is truly viable for “agentic coding,” and whether a “compact” model configuration is sufficient.
  • The post reports the user’s observed throughput performance: about 165–170 tokens per second with either configuration on a 5090 GPU.
  • Overall, the takeaway is a practical comparison focused on context length trade-offs and real-world speed for agentic coding workflows rather than a new product release.

What would you choose if you were in my shoes? How viable is 125k for agentic coding really? is "compact" really good enough, or would you go with Q6_K 125k?

I am getting around 165-170 tok/sec with either config with my 5090.

submitted by /u/ComfyUser48
[link] [comments]