Recently I had been using Antigravity for mostly vibe coding stuff that i needed. But the limits have hit hard. (have google ai pro yearly plan)
So I pivoted to local LLMs to augment it. After extensive testing of different models I have settled on Qwen 3.5 35B A3B Heretic Opus (Q4_K_M GGUF).
My specs are: (Lenovo Legion)
- CPU: i9-14900HX (8 P-Cores, E-cores disabled in BIOS, 32GB DDR5 RAM)
- GPU: RTX 4060m (8GB VRAM)
Currently I am getting about 700t/s for prompt processing and 42t/s for token generation which is respectable for my 8gb vram gpu. Here are the settings i settled upon after some testing:
Using llama cpp:
-ngl 99 ^
--n-cpu-moe 40 ^
-c 192000 ^
-t 12 ^
-tb 16 ^
-b 4096 ^
--ubatch-size 2048 ^
--flash-attn on ^
--cache-type-k q8_0 ^
--cache-type-v q8_0 ^
--mlock
After some research the closest thing to Antigravity I could find is Cline in VSCode. I use kat-coder-pro for Plan and qwen3.5 for Act mode. Is this setup better or should i stick to google gemini 3 flash in antigravity which has plenty of limits and is pretty fast? I dont care much about privacy, only about getting work done smoothly. Any suggestions for potential improvement?
Thanks.
[link] [comments]

