Hey everyone,
I'm a programmer and I'd love to use local LLMs as a kind of "superpower" to move faster in my day-to-day work.
Typical use case: I'm working on a codebase (Rust, Python, Go, or TypeScript with React/Vue), and I want the model to understand the existing project and implement new features on top of it — ideally writing code directly in my IDE, like a pair programming partner.
Right now I've tried cloud models like Claude, Qwen, ChatGPT, and GLM. Results are honestly great (especially Claude), but cost and privacy are starting to bother me — hence the interest in going local.
My current setup:
Ryzen 9 9950X 96 GB DDR5 RAM GPU still to choose
I'm considering a few options and I'm not sure what makes the most sense:
- Option A: Add a GPU
Nvidia 5090 (~€ 3500) AMD R9700 32 GB (~€ 1300)
Option B: Go all-in on a MacBook Pro M5 Max (128 GB RAM, ~€ 7000)
My main questions: 1. Are there local LLMs that actually get close to Claude-level performance for coding tasks?
Are there solid benchmarks specifically for coding + codebase-aware edits?
Which local models are currently best for this kind of workflow?
How much VRAM / unified memory do you realistically need for this use case?
Dense vs MoE models - what works better locally?
Does generation speed really matter that much? (e.g. 45 tok/s vs 100+ tok/s in real usage)
What tools are people using for this? (IDE plugins, local agents, etc.)
How can I test these setups before dropping thousands on hardware?
Curious to hear from people who are actually running local setups for real dev work (not just demos). What's your experience like?
[link] [comments]



