Today, what hardware to get for running large-ish local models like qwen 120b ?

Reddit r/LocalLLaMA / 3/22/2026

💬 OpinionTools & Practical UsageModels & Research

共有:

Key Points

The post discusses running large local models, including quantized options like qwen 3.5 and larger variants such as qwen 120B, in combination with proprietary models for a fire-and-forget workflow focused on coding, tooling, and image understanding.
It evaluates hardware options within a 10-15k USD budget (RTX Pro 6000, Mac Studio Ultra, or DGX Spark) and questions whether Nvidia NVFP4 is the future to maximize inference speed.
The envisioned setup uses the local model to do the grunt work while proprietary models handle larger reasoning tasks, aiming to minimize external API usage.
It imagines a near-term future where automated workflows could autonomously complete tasks like GitHub issues, reducing active user input and API costs.

Hey,

Tldr: use local models like qwen 3.5 quantized with proprietary models for fire and forget work. Local model doing the grunt work. What to buy: rtx pro 6000? Mac ultra (wait for m5), or dgx spark? Inference speed is crucial for quick work. Seems like nvidia's nvfp4 is the future? Budget: 10-15k usd.

Im looking to build or upgrade my current rig to be able to run quantized models luke qwen 120b (pick your q level that makes sense) primarily for coding, tool usage, and image understanding capabilities.

I intend on using the local model for inference for writing code and using tools like running scripts, tests, taking screenshots, using the browser. But I intend to use it with proprietary nodels for bigger reasoning like sonnet and opus. They will be the architects.

The goal is: to have the large-ish models do the grunt work, ask the proprietary models for clarifications and help (while limiting the proprietary model usage heavily) and do that in a constant loop until all tasks in the backlog are finish. A fire and forget style.

It feel we are not far away from that reality where I can step away from the pc and have my open github issues being completed when I return. And we will for sure reach that reality sometime soon.

So I dont want to break bank running only proprietary models via api, and over time the investment into local will pay off.

Thanks!

submitted by /u/romantimm25
[link] [comments]

I built an online background remover and learned a lot from launching it

Dev.to

How AI is Transforming Dynamics 365 Business Central

Dev.to

Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm

Reddit r/artificial

ShieldCortex: What We Learned Protecting AI Agent Memory

Dev.to

WordPress Theme Customization Without Code: The AI Revolution

Dev.to

Today, what hardware to get for running large-ish local models like qwen 120b ?

Key Points

Related Articles

I built an online background remover and learned a lot from launching it

How AI is Transforming Dynamics 365 Business Central

Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm

ShieldCortex: What We Learned Protecting AI Agent Memory

WordPress Theme Customization Without Code: The AI Revolution

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer