I spent some time yesterday after work trying out the new qwen3.6-35b-a3b model, and at least for me it's the first time that I actually felt that a local model wasn't more of a pain to use than it was worth.
I've been using LLMs in my personal/throwaway projects for a few months, for the kind of code that I don't feel any passion writing (most UI XML in Avalonia, embedded systems C++), and I used to have Sonet and Opus for free thanks to Github's student program but they cancelled that. I've been trying out local models for quite a while too but it's mostly felt up until this point that they were either too dumb to get the job done, or they could complete it but I would spend so much time fixing/tweaking/formatting/refactoring the code that I might as well have just done it myself.
Qwen3.6 seems to have finally changed that, at least on my system and projects. Running on a 5090 + 4090 I can load the Q8 model with full 260k context, getting around 170 tokens per second also makes it one of the fastest models I've tried. And unlike all other models I've tried recently including Gemma 4, it can actually complete tasks and only requires minor guidance or corrections at the end. 9 times out of 10, simply asking it to review its own changes once it is 'done' is enough for it to catch and correct anything that was wrong.
I'm pretty impressed and it's really cool to see local models finally start to get to this point. It gives me hope for a future where this technology is not limited to massive data centers and subscription services, but rather being optimized to the point where even mid-range computers can take advantage of it.
[link] [comments]



