AI Navigate

4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing??

Reddit r/LocalLLaMA / 3/11/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • A user shares their experience building a local AI setup using four 32GB V100 SXM GPUs connected via NVLink on a single board, optimizing for large model handling on a budget.
  • The configuration offers a unified 128GB VRAM pool with 900GB/s bandwidth, seen by the system as a single GPU, enabling efficient processing without latency overhead.
  • Doubling the setup with two such NVLink boards via a PCIe card can provide up to 256GB of VRAM for under $5,000, presenting a cost-effective solution compared to newer generation GPUs.
  • The user emphasizes using local AI models for productivity tasks like document organization and financial analysis due to ethical concerns around frontier AI models.
  • Despite being slightly older technology, the V100 SXM-based setup meets the user’s needs effectively, raising points about awareness and accessibility of such hardware options.
4 32 gb SXM V100s, nvlinked on a board, best budget option for big models. Or what am I missing??

Just wondering why I only see a few posts about what’s become the core of my setup. I am a lawyer who has to stay local for the most interesting productivity enhancing stuff with AI. Even if there’s a .01% chance of there being real potential ethical consequences of using frontier models, not gonna risk it. Also, for document organization, form generation, financial extraction and analysis, and pattern matching, I don’t need opus 4.6.

But I want to run the best local models to crunch and organize to eventually replicate my work product.

Went on a GPU buying binge, and I just don’t see what I’m missing. V100s on an nvlink board is the best bang for your buck I can find.

Buy 4 32gb v100 sxm cards/heatsinks for 1600, get the aom sxm board and pex card for 750. That’s 128gb of unified nvlink vram for 2400. 900gb/s and a unified 128gb pool.

I feel like people don’t understand how significant it is that these 4 cards are connected on the board via NVLink. It’s one huge pool of vram. No latency. System sees it as a single GPU.

With the PEX pcie card, you can actually run two of those boards on one pcie slot. So 256 gb (2x128gb, two pools) of 900gbps vram for under 5k. Just need an x16 pcie slot, and enough PSU (they run well at 200 watts peak per card, so 800 or 1600 watts of power). Those are today’s prices.

I know it’s like 2 generations old, but it seems like everything I run works well.

Does nobody know about alibaba or what?

submitted by /u/TumbleweedNew6515
[link] [comments]