Local coding with 12 GB VRAM, 32 GB RAM- best models?

Reddit r/LocalLLaMA / 4/12/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

A new user asks whether high-quality local LLM coding workflows (similar to Claude Sonnet/Opus) are achievable on hardware with 12 GB VRAM and 32 GB RAM, including running the model overnight.
The post questions whether upgrading/doubling hardware would meaningfully improve model quality locally, or whether Sonnet/Opus-level quality remains largely available only via APIs.
The discussion is framed around local hosting constraints (VRAM limits, latency vs overnight runs) and selecting “best models” that fit the stated machine specs.
It highlights a practical decision point for users comparing local inference quality/capability versus cloud/API access.

I'm new to local hosting a LLM.

I've been using Claude Sonnet a lot and having lots of success with that. I'd like to explore a workflow where I leave a local LLM to run overnight on my hardware so it doesn't need to be fast, but I do need the quality of models such as sonnet & opus.

Is this achievable currently within these sorts of specs? Would doubling my hardware make it achievable, or is the kind of quality only available over API currently?

submitted by /u/TechnicalyAnIdiot
[link] [comments]