Local LLM setup for coding (pair programming style) - GPU vs MacBook Pro?

Reddit r/LocalLLaMA / 4/20/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • A developer is looking to run local LLMs for pair-programming-style coding assistance in their IDE, using the model to understand an existing codebase and implement new features.
  • They are weighing the cost and privacy benefits of moving from cloud models (Claude, Qwen, ChatGPT, GLM) to local hardware, with specific GPU and MacBook Pro M5 Max options under consideration.
  • Key questions include whether any local models can approach Claude-level coding performance, which models are best for codebase-aware edits, and what realistic VRAM or unified memory requirements are.
  • They also want guidance on model architecture tradeoffs (dense vs MoE), whether generation speed materially affects day-to-day usefulness, and what tooling (IDE plugins/local agents) people use in practice.
  • They’re asking for practical ways to benchmark and test setups before spending thousands on hardware.

Hey everyone,

I'm a programmer and I'd love to use local LLMs as a kind of "superpower" to move faster in my day-to-day work.

Typical use case: I'm working on a codebase (Rust, Python, Go, or TypeScript with React/Vue), and I want the model to understand the existing project and implement new features on top of it — ideally writing code directly in my IDE, like a pair programming partner.

Right now I've tried cloud models like Claude, Qwen, ChatGPT, and GLM. Results are honestly great (especially Claude), but cost and privacy are starting to bother me — hence the interest in going local.

My current setup:

Ryzen 9 9950X 96 GB DDR5 RAM GPU still to choose

I'm considering a few options and I'm not sure what makes the most sense:

  • Option A: Add a GPU

Nvidia 5090 (~€ 3500) AMD R9700 32 GB (~€ 1300)

Option B: Go all-in on a MacBook Pro M5 Max (128 GB RAM, ~€ 7000)

My main questions: 1. Are there local LLMs that actually get close to Claude-level performance for coding tasks?

  1. Are there solid benchmarks specifically for coding + codebase-aware edits?

  2. Which local models are currently best for this kind of workflow?

  3. How much VRAM / unified memory do you realistically need for this use case?

  4. Dense vs MoE models - what works better locally?

  5. Does generation speed really matter that much? (e.g. 45 tok/s vs 100+ tok/s in real usage)

  6. What tools are people using for this? (IDE plugins, local agents, etc.)

  7. How can I test these setups before dropping thousands on hardware?

Curious to hear from people who are actually running local setups for real dev work (not just demos). What's your experience like?

submitted by /u/bajis12870
[link] [comments]