Qwen3.6 is incredible with OpenCode!

Reddit r/LocalLLaMA / 4/18/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The author reports that Qwen3.6 performs markedly better than previously tried local coding models (including Gemma 4), making it feel viable for day-to-day coding tasks rather than relying on Claude Code.
  • In a demanding test implementing Postgres Row-Level Security (RLS) across a multi-service Rust/TypeScript/Python codebase, the model reportedly produced an excellent solution with useful iteration from compiler errors.
  • While not perfect—there are major gaps and some bugs—the author highlights that it can stay on track and refine its approach instead of getting lost.
  • The author describes an interactive coding session where the model proposed an overly broad edit (29 files) but then adjusted to a lower-churn solution involving acquiring a DB connection and scoping it to the user per incoming request.
  • Performance and setup details are shared, including a local llama.cpp deployment using Qwen3.6-35B-A3B on an RTX 4090 with 24GB VRAM, large context size (262k), and specific llama.cpp flags to avoid OOM crashes caused by Opencode’s parallel tool calls.

I've tried a few different local models in the past (gemma 4 being the latest), but none of them felt as good as this. (Or maybe I just didn't give them a proper chance, you guys let me know). But this genuinely feels like a model I could daily drive for certain tasks instead of reaching for Claude Code.

I gave it a fairly complex task of implementing RLS in postgres across a large-ish codebase with multiple services written in rust, typescript and python. I had zero expectations going in, but it did an amazing job. PR: https://github.com/getomnico/omni/pull/165/changes/dd04685b6cf47e7c3791f9cdbd807595ef4c686e

Now it's far from perfect, there's major gaps and a couple of major bugs, but my god, is this thing good. It doesn't one-shot rust like Opus can, but it's able to look at compiler errors and iterate without getting lost.

I had a fairly long coding session lasting multiple rounds of plan -> build -> plan... at one point it went down a path editing 29 files to use RLS across all db queries, which was ok, but I stepped in and asked it to reconsider, maybe look at other options to minimize churn. It found the right solution, acquiring a db connection and scoping it to the user at the beginning of the incoming request.

For the first time, it felt like talking to a truly capable local coding model.

My setup:

  • Qwen3.6-35B-A3B, IQ4_NL unsloth quant
  • Deployed locally via llama.cpp
  • RTX 4090, 24 GB
  • KV cache quant: q8_0
  • Context size: 262k. At this ctx size, vram use sits at ~21GB
  • Thinking enabled, with recommended settings of temp, min_p etc.

llama server:

```
docker run -d --name llama-server --gpus all -v <path\_to\_models>:/models -p 8080:8080 local/llama.cpp:server-cuda -m /models/qwen3.6-35b-a3b/Qwen3.6-35B-A3B-UD-IQ4_NL.gguf --port 8080 --host 0.0.0.0 --ctx-size 262144 -n 8192 --n-gpu-layers 40 --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 --parallel 1 --cache-type-k q8_0 --cache-type-v q8_0 --cache-ram 4096
```

Had to set `--parallel` and `--cache-ram` without which llama.cpp would crash with OOM because opencode makes a bunch of parallel tools calls that blow up prompt cache. I get 100+ output tok/sec with this.

But this might be it guys... the holy grail of local coding! Or getting very close to it at any rate.

submitted by /u/CountlessFlies
[link] [comments]