Qwen 3.6 is the first local model that actually feels worth the effort for me

Reddit r/LocalLLaMA / 4/17/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • A user reports that trying the qwen3.6-35b-a3b model is the first time a local LLM felt genuinely “worth it” rather than a hassle to use.
  • Compared with earlier local models that were either insufficient or required heavy post-fixing, Qwen 3.6 reportedly completes tasks with only minor guidance or end-stage corrections.
  • On a dual-GPU setup (5090 + 4090), the user says they can run an 8-bit (Q8) version with up to a 260k context window and achieve around 170 tokens per second, making it among the fastest they’ve tested.
  • The user contrasts the experience with other recently tried models (including Gemma 4), claiming Qwen 3.6 is more reliable at catching mistakes—often by asking it to review its own changes.
  • Overall, the post expresses optimism that local models will become efficient enough to run on mid-range computers, not just large data centers and paid services.

I spent some time yesterday after work trying out the new qwen3.6-35b-a3b model, and at least for me it's the first time that I actually felt that a local model wasn't more of a pain to use than it was worth.

I've been using LLMs in my personal/throwaway projects for a few months, for the kind of code that I don't feel any passion writing (most UI XML in Avalonia, embedded systems C++), and I used to have Sonet and Opus for free thanks to Github's student program but they cancelled that. I've been trying out local models for quite a while too but it's mostly felt up until this point that they were either too dumb to get the job done, or they could complete it but I would spend so much time fixing/tweaking/formatting/refactoring the code that I might as well have just done it myself.

Qwen3.6 seems to have finally changed that, at least on my system and projects. Running on a 5090 + 4090 I can load the Q8 model with full 260k context, getting around 170 tokens per second also makes it one of the fastest models I've tried. And unlike all other models I've tried recently including Gemma 4, it can actually complete tasks and only requires minor guidance or corrections at the end. 9 times out of 10, simply asking it to review its own changes once it is 'done' is enough for it to catch and correct anything that was wrong.

I'm pretty impressed and it's really cool to see local models finally start to get to this point. It gives me hope for a future where this technology is not limited to massive data centers and subscription services, but rather being optimized to the point where even mid-range computers can take advantage of it.

submitted by /u/Epicguru
[link] [comments]