Been using Qwen-3.6-27B-q8_k_xl + VSCode + RTX 6000 Pro As Daily Driver

Reddit r/LocalLLaMA / 5/2/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • The author reports trying Qwen 3.6 as a daily local LLM setup using VSCode (Insiders), LM Studio, and an RTX 6000 Pro, after setting up local model support with ease.
  • After testing multiple quantized variants across Qwen 3.6 and Gemma 4, they identify Qwen-3.6-27B-q8_k_xl (via Unsloth) as the clear best performer for their use.
  • They find token generation somewhat slow but comparable to hosted models they previously used (including GitHub Copilot), with notably capable performance when paired with the right tool-calling workflow.
  • The model still requires guidance to achieve good code quality; the author notes it may not operate at the “feature-level” expected from top-tier models like Opus 4.6.
  • They conclude that with a careful “Plan” step and solid systems architecture understanding, the local setup can implement tasks reliably without using any API tokens, while they now need more GPU compute (another RTX 6000).

So in response to the Great Token Reconning of 2026, I decided to try out Qwen 3.6 as a daily driver, and although it's only been about a day, I have to say I'm thoroughly impressed.

I had to download the VSCode insiders edition and set up the local models to support - super easy. Then I messed around with Gemma 4 and Qwen 3.6 (served with LM Studio) while performing typical tasks as I build out an app that does a lot of data mining and web scraping.

After trying out all the versions of the two models with the different quants, there is a clear winner: Qwen-3.6-27B-q8_k_xl by Unsloth.

I AM SO IMPRESSED! The token generation can be a tad bit slow, but the truth is, I was seeing long delays even when I was using Github Copilot hosted models. It felt about the same speed wise overall, maybe a touch slower than hosted. But whats impressive is with appropriate tool calling this little dense model can handle its own just fine.

To be clear, I dont think this it can work at the feature level like Opus 4.6 could. You cant just say "Hey implement this feature" - vibe coders and non-coders wont survive with this most likely. There were a few times where I had to steer it to improve it's code quality and approach, but functionally it was nailing it.

If you always do a Plan round first and really work out all the details, then it will get there, and then implement it without issue. If you have a decent grasp of systems architecture this is perfectly hitting that "good enough" status for a local model. I have been plugging away all day and havent used a single API token.

Now I need another RTX6000 so I'm not fighting with my agents for compute 😝

submitted by /u/Demonicated
[link] [comments]