| Just got an Nvidia V100 32 Gb mounted on a PCI-Exp GPU kind of card, paid about 500 USD for it (shipping & insurance included) and it’s performing quite well IMO. Yeah I know there is no more support for it and it’s old, and it’s loud, but it’s hard to beat at that price point. Based on a quick comparaison I’m getting between 20%-100% more token/s than an M3 Ultra, M4 Max (compared with online data) would on the same models, again, not too bad for the price. Anyone else still using these ? Which models are you running with them ? I’m looking into getting an other 3 and connecting them with those 4xNVLink boards, also looking into pricing for A100 80Gb. [link] [comments] |
Nvidia V100 32 Gb getting 115 t/s on Qwen Coder 30B A3B Q5
Reddit r/LocalLLaMA / 3/22/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical Usage
Key Points
- A Nvidia V100 32 GB GPU achieved about 115 tokens per second on Qwen Coder 30B A3B Q5, according to the Reddit post.
- The user paid roughly $500 including shipping for the V100, noting it is old and loud but still offers strong price-to-performance.
- The post claims the V100 delivers 20-100% more token/sec than an M3 Ultra or M4 Max based on online comparisons, indicating notable value at that price point.
- They are considering adding three more V100s and linking them with 4x NVLink boards, and are also exploring pricing for an A100 80GB for potentially upgrading.
- This highlights ongoing interest in repurposing older GPUs for AI inference at low cost, while acknowledging caveats around support and practicality.
Related Articles
The massive shift toward edge computing and local processing
Dev.to
Self-Refining Agents in Spec-Driven Development
Dev.to
How to Optimize Your LinkedIn Profile with AI in 2026 (Get Found by Recruiters)
Dev.to
Agentforce Builder: How to Build AI Agents in Salesforce
Dev.to
How AI Consulting Services Support Staff Development in Dubai
Dev.to