Hey,
Tldr: use local models like qwen 3.5 quantized with proprietary models for fire and forget work. Local model doing the grunt work. What to buy: rtx pro 6000? Mac ultra (wait for m5), or dgx spark? Inference speed is crucial for quick work. Seems like nvidia's nvfp4 is the future? Budget: 10-15k usd.
Im looking to build or upgrade my current rig to be able to run quantized models luke qwen 120b (pick your q level that makes sense) primarily for coding, tool usage, and image understanding capabilities.
I intend on using the local model for inference for writing code and using tools like running scripts, tests, taking screenshots, using the browser. But I intend to use it with proprietary nodels for bigger reasoning like sonnet and opus. They will be the architects.
The goal is: to have the large-ish models do the grunt work, ask the proprietary models for clarifications and help (while limiting the proprietary model usage heavily) and do that in a constant loop until all tasks in the backlog are finish. A fire and forget style.
It feel we are not far away from that reality where I can step away from the pc and have my open github issues being completed when I return. And we will for sure reach that reality sometime soon.
So I dont want to break bank running only proprietary models via api, and over time the investment into local will pay off.
Thanks!
[link] [comments]
