Local coding with 12 GB VRAM, 32 GB RAM- best models?

Reddit r/LocalLLaMA / 4/12/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • A new user asks whether high-quality local LLM coding workflows (similar to Claude Sonnet/Opus) are achievable on hardware with 12 GB VRAM and 32 GB RAM, including running the model overnight.
  • The post questions whether upgrading/doubling hardware would meaningfully improve model quality locally, or whether Sonnet/Opus-level quality remains largely available only via APIs.
  • The discussion is framed around local hosting constraints (VRAM limits, latency vs overnight runs) and selecting “best models” that fit the stated machine specs.
  • It highlights a practical decision point for users comparing local inference quality/capability versus cloud/API access.

I'm new to local hosting a LLM.

I've been using Claude Sonnet a lot and having lots of success with that. I'd like to explore a workflow where I leave a local LLM to run overnight on my hardware so it doesn't need to be fast, but I do need the quality of models such as sonnet & opus.

Is this achievable currently within these sorts of specs? Would doubling my hardware make it achievable, or is the kind of quality only available over API currently?

submitted by /u/TechnicalyAnIdiot
[link] [comments]