Since Nvidia is very vague about the actual spec of the Blackwell pro cards, after some detective work, I am able to deduce the actual theoretical tensor core (TC) performance for the Nvidia B100/B200/B300 chips. I suppose it would be useful for the billionaires here. ;)
From the numbers in this reddit page from a person who has access to B200:
https://www.reddit.com/r/nvidia/comments/1khwaw5/battle_of_the_giants_nvidia_blackwell_b200_takes/
We can tell that number of cores of B200 is 18944 and boost clock speed is 1965MHz. This gives a FP16 Tensor Core dense performance of 1191.2TFLOPS.
From these three official Nvidia docs and the numbers I just got:
https://cdn.prod.website-files.com/61dda201f29b7efc52c5fbaf/6602ea9d0ce8cb73fb6de87f_nvidia-blackwell-architecture-technical-brief.pdf
https://resources.nvidia.com/en-us-blackwell-architecture|
https://resources.nvidia.com/en-us-blackwell-architecture/blackwell-ultra-datasheet
We can deduce that essentially, B100 is an H100 with HBM3e VRAM and FP4 support.
B200 is a bigger Hopper H100 with HBM3e and FP4 support.
B300 has exactly the same performances as B200 except for FP64, TC FP4 and TC INT8. B300 is sort of like a mix of B200 and B202 used in 5090. It cuts FP64 and TC INT8 performance to 5090 level and to make room for TC FP4 such that TC FP4 receives a boost of 50%. This translates to TC FP4 dense at 14.29PFLOPS vs 9.53PFLOPS in B200.
B300 is a B200 but with 50% boost in FP4 makes it more suitable for AI workload but the cut in FP64 makes it not suitable for scientific/finance workload.
This fits my understanding that blackwell is just a bigger Hopper/Ada with TC FP4 support.
[link] [comments]




