New blog post by Daniel Vega-Myhre (Meta/PyTorch) illustrating GEMM design for FP8, including deep-dives into all the constraints and design challenges introduced by MXFP8.
Link: https://danielvegamyhre.github.io/2026/03/29/mxfp8-gemm.html
Original Tweet: https://x.com/vega_myhre/status/2038293614204445039
Additional resources:
MXFP8 and DeepEP for DeepSeek-V3 on B200 w/ TorchTitan: https://pytorch.org/blog/enabling-up-to-41-faster-pre-training-mxfp8-and-deepep-for-deepseek-v3-on-b200-with-torchtitan/
[link] [comments]

