what best coding model at 4B or 8B parameters?

Reddit r/LocalLLaMA / 4/19/2026

💬 OpinionSignals & Early TrendsTools & Practical UsageModels & Research

Key Points

  • A Reddit user asks the community which coding LLM is best for their constrained hardware, specifically models around 4B or 8B parameters due to a GTX 1050 4GB VRAM limit.
  • They report that they can only run up to roughly 4B (and have tested smaller 1B models) and that their quantization ceiling is around Q3_XXS, with more aggressive quant (Q2/Q1) likely degrading quality.
  • The user mentions they have searched benchmarks across Google/Hugging Face/YouTube and tried models via LM Studio, but wants experience-based recommendations rather than numbers alone.
  • They also note they haven’t been able to test very large models (e.g., Qwen 3.6 35B) and are considering possibly using 8B or even 14B models within their setup.
  • Overall, the post is a request for practical, hands-on guidance on selecting a local coding model that balances performance and accuracy on low-VRAM GPUs.

yea i know the title looks so stupid, yes i done searches, i searched google, huggingface, youtube, i even tested some via LM Studio, but due to my low-end VRAM (GTX 1050 4G Vram) i cant fit more than 4B or 1B into it, i have about 20G RAM + 15G Pagefile, i didnt have the chance to test out Qwen 3.6 35B, my maximum Quant was Q3_XXS, but this and what comes after it (Q2, Q1) will drop plenty of information, and would make the model way more stupider, so i thought about 8B and maybe 14B, but most of my searches all i saw just numbers and benchmarks, so i thought i could just get here and ask people who done experience by themselves and saw results

submitted by /u/Felix_455-788
[link] [comments]