2x RTX Pro 6000 vs 2x A100 80GB dense model inference

Reddit r/LocalLLaMA / 3/29/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

ユーザーが「両方の構成に収まる最大級の“dense”モデル」で、2x RTX Pro 6000（Blackwell 96GB、NVFP4量子化）と2x A100 80GB（Ampere、W4A16量子化）の推論性能を比較した事例があるかを質問している。
比較対象は「疎（sparse）やMoEではない最大のdenseモデル」であり、量子化方式やネットワーク構成（PCIe Gen5 x16、NVLink接続）も前提として提示されている。
RTX Pro 6000側はワークステーション用途でMax-Qではない条件、A100側はトリプルNV-Linkでの接続条件が明確にされている。
具体的なベンチマーク結果の有無が争点で、コミュニティでの実測・既知知見の共有を促す内容になっている。

Has anyone compared inference performance of the largest dense model (not sparse or MoE) that will fit on both of these setups to be compared?

* On a PCIe Gen5 x16 bus, 2x RTX Pro 6000 Blackwell 96GB (workstation, not Max-Q): NVFP4 quantized

* Triple NV-Link'd, 2x A100 80GB Ampere: W4A16 quantized

Dev.to

Dev.to

Dev.to

Dev.to

Dev.to