Qwen 3.5 35b on 8GB Vram for local agentic workflow

Reddit r/LocalLLaMA / 3/22/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • They pivoted from Antigravity to local LLMs and settled on Qwen 3.5 35B A3B Heretic Opus (Q4_K_M GGUF) for local agentic workflows.
  • The setup uses a Lenovo Legion with an i9-14900HX, 32GB RAM, and an RTX 4060m (8GB VRAM).
  • They report ~700 tokens/sec for prompt processing and ~42 tokens/sec for token generation on 8GB VRAM, using a llama.cpp config with flags such as ngl 99, --n-cpu-moe 40, -c 192000, -t 12, -tb 16, -b 4096, --ubatch-size 2048, --flash-attn on, --cache-type-k q8_0, --cache-type-v q8_0, and --mlock.
  • They compare to Antigravity and ask for suggestions on whether to stick with Gemini 3 Flash in Antigravity, while mentioning complementary tools like Cline in VSCode and kat-coder-pro for Plan and qwen3.5 for Act mode.

Recently I had been using Antigravity for mostly vibe coding stuff that i needed. But the limits have hit hard. (have google ai pro yearly plan)

So I pivoted to local LLMs to augment it. After extensive testing of different models I have settled on Qwen 3.5 35B A3B Heretic Opus (Q4_K_M GGUF).

My specs are: (Lenovo Legion)

  • CPU: i9-14900HX (8 P-Cores, E-cores disabled in BIOS, 32GB DDR5 RAM)
  • GPU: RTX 4060m (8GB VRAM)

Currently I am getting about 700t/s for prompt processing and 42t/s for token generation which is respectable for my 8gb vram gpu. Here are the settings i settled upon after some testing:

Using llama cpp:

-ngl 99 ^

--n-cpu-moe 40 ^

-c 192000 ^

-t 12 ^

-tb 16 ^

-b 4096 ^

--ubatch-size 2048 ^

--flash-attn on ^

--cache-type-k q8_0 ^

--cache-type-v q8_0 ^

--mlock

After some research the closest thing to Antigravity I could find is Cline in VSCode. I use kat-coder-pro for Plan and qwen3.5 for Act mode. Is this setup better or should i stick to google gemini 3 flash in antigravity which has plenty of limits and is pretty fast? I dont care much about privacy, only about getting work done smoothly. Any suggestions for potential improvement?

Thanks.

submitted by /u/Heisenberggg03
[link] [comments]

Qwen 3.5 35b on 8GB Vram for local agentic workflow | AI Navigate