Qwen 3.5 35b on 8GB Vram for local agentic workflow

Reddit r/LocalLLaMA / 3/22/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

They pivoted from Antigravity to local LLMs and settled on Qwen 3.5 35B A3B Heretic Opus (Q4_K_M GGUF) for local agentic workflows.
The setup uses a Lenovo Legion with an i9-14900HX, 32GB RAM, and an RTX 4060m (8GB VRAM).
They report ~700 tokens/sec for prompt processing and ~42 tokens/sec for token generation on 8GB VRAM, using a llama.cpp config with flags such as ngl 99, --n-cpu-moe 40, -c 192000, -t 12, -tb 16, -b 4096, --ubatch-size 2048, --flash-attn on, --cache-type-k q8_0, --cache-type-v q8_0, and --mlock.
They compare to Antigravity and ask for suggestions on whether to stick with Gemini 3 Flash in Antigravity, while mentioning complementary tools like Cline in VSCode and kat-coder-pro for Plan and qwen3.5 for Act mode.

Recently I had been using Antigravity for mostly vibe coding stuff that i needed. But the limits have hit hard. (have google ai pro yearly plan)

So I pivoted to local LLMs to augment it. After extensive testing of different models I have settled on Qwen 3.5 35B A3B Heretic Opus (Q4_K_M GGUF).

My specs are: (Lenovo Legion)

CPU: i9-14900HX (8 P-Cores, E-cores disabled in BIOS, 32GB DDR5 RAM)
GPU: RTX 4060m (8GB VRAM)

Currently I am getting about 700t/s for prompt processing and 42t/s for token generation which is respectable for my 8gb vram gpu. Here are the settings i settled upon after some testing:

Using llama cpp:

-ngl 99 ^

--n-cpu-moe 40 ^

-c 192000 ^

-t 12 ^

-tb 16 ^

-b 4096 ^

--ubatch-size 2048 ^

--flash-attn on ^

--cache-type-k q8_0 ^

--cache-type-v q8_0 ^

--mlock

After some research the closest thing to Antigravity I could find is Cline in VSCode. I use kat-coder-pro for Plan and qwen3.5 for Act mode. Is this setup better or should i stick to google gemini 3 flash in antigravity which has plenty of limits and is pretty fast? I dont care much about privacy, only about getting work done smoothly. Any suggestions for potential improvement?

Thanks.

submitted by /u/Heisenberggg03
[link] [comments]

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/22DailyView insight →

The Moonwell Oracle Exploit: How AI-Assisted 'Vibe Coding' Turned cbETH Into a $1.12 Token and Cost $1.78M

Dev.to

How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command

Dev.to

Day 10: An AI Agent's Revenue Report — $29, 25 Products, 160 Tweets

Dev.to

What CVE-2026-25253 Taught Me About Building Safe AI Assistants

Dev.to

Vision and Hardware Strategy Shaping the Future of AI: From Apple to AGI and AI Chips

Dev.to

Qwen 3.5 35b on 8GB Vram for local agentic workflow

Key Points

💡 Insights using this article

Related Articles

The Moonwell Oracle Exploit: How AI-Assisted 'Vibe Coding' Turned cbETH Into a $1.12 Token and Cost $1.78M

How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command

Day 10: An AI Agent's Revenue Report — $29, 25 Products, 160 Tweets

What CVE-2026-25253 Taught Me About Building Safe AI Assistants

Vision and Hardware Strategy Shaping the Future of AI: From Apple to AGI and AI Chips

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer