AI Navigate

Nvidia V100 32 Gb getting 115 t/s on Qwen Coder 30B A3B Q5

Reddit r/LocalLLaMA / 3/22/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage

Key Points

  • A Nvidia V100 32 GB GPU achieved about 115 tokens per second on Qwen Coder 30B A3B Q5, according to the Reddit post.
  • The user paid roughly $500 including shipping for the V100, noting it is old and loud but still offers strong price-to-performance.
  • The post claims the V100 delivers 20-100% more token/sec than an M3 Ultra or M4 Max based on online comparisons, indicating notable value at that price point.
  • They are considering adding three more V100s and linking them with 4x NVLink boards, and are also exploring pricing for an A100 80GB for potentially upgrading.
  • This highlights ongoing interest in repurposing older GPUs for AI inference at low cost, while acknowledging caveats around support and practicality.
Nvidia V100 32 Gb getting 115 t/s on Qwen Coder 30B A3B Q5

Just got an Nvidia V100 32 Gb mounted on a PCI-Exp GPU kind of card, paid about 500 USD for it (shipping & insurance included) and it’s performing quite well IMO.

Yeah I know there is no more support for it and it’s old, and it’s loud, but it’s hard to beat at that price point. Based on a quick comparaison I’m getting between 20%-100% more token/s than an M3 Ultra, M4 Max (compared with online data) would on the same models, again, not too bad for the price.

Anyone else still using these ? Which models are you running with them ? I’m looking into getting an other 3 and connecting them with those 4xNVLink boards, also looking into pricing for A100 80Gb.

submitted by /u/icepatfork
[link] [comments]