I have to admit I am quite impressed. My hardware is an Nvidia Geforce RTX 3060 with 12 GB VRAM so it's quite limited. I have been "model-hopping" to see what works best for me.
I mainly did my tests with Kilo Code but sometimes I tried Roo Code as well
Originally I used a customized Qwen 2.5 Coder for tools calls, It was relatively fast but usually would fail doing tool calls.
Then I tested multiple Unsloth quantizations on Qwen 3 Coder. 1-bit quants would work also relatively fast but usually failed doing tool calls as well. However I've been using UD-TQ1_0 for code completion with Continue and has been quite good, better than what I experienced compared to smaller Qwen2.5 Coder models. 2-bit quants worked a little bit better (it would still fail sometimes), however it started feeling really slow and kinda unstable.
Then, similarly to my original tests with Qwen 2.5, tried this version of Qwen3, also optimized for tools (14b), my experience was significantly better but still a bit slow, I should probably have gone with 8b instead. I noticed that, these general Qwen versions that are not optimized for coding worked better for me, probably because they were smaller and would fit better, so instead of trying Qwen3-8b, I went with Qwen3.5-9b, and this is where I got really surprised.
Finally had the agent working for more than an hour, doing kind of significant work and capable of going on by itself without getting stuck.
I know every setup is different, but if you are running on consumer hardware with limited VRAM, I think this represents amazing progress.
TL;DR: Qwen 3.5 (9B) with 12 VRAM actually works very well for agentic calls. Unsloth-Qwen3 Coder 30B UD-TQ1_0 is good for code completion
[link] [comments]




