Has anyone compared inference performance of the largest dense model (not sparse or MoE) that will fit on both of these setups to be compared?
* On a PCIe Gen5 x16 bus, 2x RTX Pro 6000 Blackwell 96GB (workstation, not Max-Q): NVFP4 quantized
* Triple NV-Link'd, 2x A100 80GB Ampere: W4A16 quantized
[link] [comments]




