A few days ago I switched to Linux to try vLLM out of curiosity. Ended up creating a %100 local, parallel, multi-agent setup with Claude Code and gpt-oss-120b for concurrent vibecoding and orchestration with CC's agent Teams entirely offline. This video shows 4 agents collaborating.

Reddit r/LocalLLaMA / 3/22/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • The author built a 100% local, offline, parallel multi-agent setup on Linux using vLLM with Claude Code and gpt-oss-120b for vibecoding and orchestration.
  • They run vLLM in a Docker container and point to the vLLM localhost endpoint rather than a cloud provider, enabling local inference orchestration via Agent Teams.
  • They achieved up to 8 agents running in parallel on a single GPU (RTX Pro 6000 Blackwell MaxQ) and observed substantial speedups, with tasks that previously took hours now around 30 minutes, scalable to tens of agents with more hardware.
  • The setup contrasts with Ollama and LM Studio, which processed requests sequentially and slowed down; switching to Linux (dual-boot with Ubuntu) eliminated Windows bottlenecks and improved performance.
A few days ago I switched to Linux to try vLLM out of curiosity. Ended up creating a %100 local, parallel, multi-agent setup with Claude Code and gpt-oss-120b for concurrent vibecoding and orchestration with CC's agent Teams entirely offline. This video shows 4 agents collaborating.

This isn't a repo, its just how my Linux workstation is built. My setup was the following:

  • vLLM Docker container - for easy deployment and parallel inference.

  • Claude Code - vibecoding and Agent Teams orchestration. Points at vLLM localhost endpoint instead of a cloud provider.

  • gpt-oss:120b - Coding agent.

  • RTX Pro 6000 Blackwell MaxQ - GPU workhorse

  • Dual-boot Ubuntu

I never realized how much Windows was holding back my PC and agents after I switched to Linux. It was so empowering when I made the switch to a dual-boot Ubuntu and hopped on to vLLM.

Back then, I had to choose between Ollama and LM studio for vibecoding but the fact that they processed requests sequentially and had quick slowdowns after a few message turns and tool calls meant that my coding agent would always be handicapped by their slower processing.

But along came vLLM and it just turbocharged my experience. In the video I showed 4 agents at work, but I've gotten my GPU to work with 8 agents in parallel continuously without any issues except throughput reduction (although this would vary greatly, depending on the agent).

Agent Team-scale tasks that would take hours to complete one-by-one could now be done in like 30 minutes, depending on the scope of the project. That means that if I were to purchase a second MaxQ later this year, the amount of agents could easily rise to tens of agents concurrently!

This would theoretically allow me to vibecode multiple projects locally, concurrently, although that setup, despite being the best-case scenario for my PC, could lead to some increased latency here and there, but ultimately would be way better than painstakingly getting an agent to complete a project one-by-one.

submitted by /u/swagonflyyyy
[link] [comments]