what best coding model at 4B or 8B parameters?

Reddit r/LocalLLaMA / 4/19/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

A Reddit user asks the community which coding LLM is best for their constrained hardware, specifically models around 4B or 8B parameters due to a GTX 1050 4GB VRAM limit.
They report that they can only run up to roughly 4B (and have tested smaller 1B models) and that their quantization ceiling is around Q3_XXS, with more aggressive quant (Q2/Q1) likely degrading quality.
The user mentions they have searched benchmarks across Google/Hugging Face/YouTube and tried models via LM Studio, but wants experience-based recommendations rather than numbers alone.
They also note they haven’t been able to test very large models (e.g., Qwen 3.6 35B) and are considering possibly using 8B or even 14B models within their setup.
Overall, the post is a request for practical, hands-on guidance on selecting a local coding model that balances performance and accuracy on low-VRAM GPUs.

yea i know the title looks so stupid, yes i done searches, i searched google, huggingface, youtube, i even tested some via LM Studio, but due to my low-end VRAM (GTX 1050 4G Vram) i cant fit more than 4B or 1B into it, i have about 20G RAM + 15G Pagefile, i didnt have the chance to test out Qwen 3.6 35B, my maximum Quant was Q3_XXS, but this and what comes after it (Q2, Q1) will drop plenty of information, and would make the model way more stupider, so i thought about 8B and maybe 14B, but most of my searches all i saw just numbers and benchmarks, so i thought i could just get here and ask people who done experience by themselves and saw results

submitted by /u/Felix_455-788
[link] [comments]