Today, what hardware to get for running large-ish local models like qwen 120b ?

Reddit r/LocalLLaMA / 3/22/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • The post discusses running large local models, including quantized options like qwen 3.5 and larger variants such as qwen 120B, in combination with proprietary models for a fire-and-forget workflow focused on coding, tooling, and image understanding.
  • It evaluates hardware options within a 10-15k USD budget (RTX Pro 6000, Mac Studio Ultra, or DGX Spark) and questions whether Nvidia NVFP4 is the future to maximize inference speed.
  • The envisioned setup uses the local model to do the grunt work while proprietary models handle larger reasoning tasks, aiming to minimize external API usage.
  • It imagines a near-term future where automated workflows could autonomously complete tasks like GitHub issues, reducing active user input and API costs.

Hey,

Tldr: use local models like qwen 3.5 quantized with proprietary models for fire and forget work. Local model doing the grunt work. What to buy: rtx pro 6000? Mac ultra (wait for m5), or dgx spark? Inference speed is crucial for quick work. Seems like nvidia's nvfp4 is the future? Budget: 10-15k usd.

Im looking to build or upgrade my current rig to be able to run quantized models luke qwen 120b (pick your q level that makes sense) primarily for coding, tool usage, and image understanding capabilities.

I intend on using the local model for inference for writing code and using tools like running scripts, tests, taking screenshots, using the browser. But I intend to use it with proprietary nodels for bigger reasoning like sonnet and opus. They will be the architects.

The goal is: to have the large-ish models do the grunt work, ask the proprietary models for clarifications and help (while limiting the proprietary model usage heavily) and do that in a constant loop until all tasks in the backlog are finish. A fire and forget style.

It feel we are not far away from that reality where I can step away from the pc and have my open github issues being completed when I return. And we will for sure reach that reality sometime soon.

So I dont want to break bank running only proprietary models via api, and over time the investment into local will pay off.

Thanks!

submitted by /u/romantimm25
[link] [comments]