Hello everyone,
After a long time testing different local models, quantizations, and tools, I wanted to share the setup I ended up sticking with for coding.
Hardware:
R5 5600X / 32GB RAM / RTX 3070 8GB
Setup:
- llama.cpp (CUDA)
- OmniCoder-9B (Q4_K_M, Q8 cache, 64K context)
- Qwen Code CLI
- Superpowers (GitHub)
I also tested Opencode + GLM-5 and Antigravity with Gemini 3.1 High.
From my experience, this setup gives a good balance between speed and output quality. It handles longer responses well and feels stable enough for regular coding use, especially for entry to intermediate tasks.
Since it’s fully local, there are no limits or costs, which makes it practical for daily use.
Curious to know what others are using and if there are better combinations I should try.
[link] [comments]




