MXNorm: Reusing MXFP block scales for efficient tensor normalisation
arXiv cs.LG / 3/16/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- MXNorm is a drop-in replacement for RMSNorm that estimates RMS using only the block scales from MXFP8, enabling a 32x reduction in the size of the reduction needed for normalization.
- The method is validated on pre-training of Llama 3 models (125M, 1B, 8B) with minimal accuracy loss compared to a RMSNorm baseline.
- It achieves practical kernel speedups up to 2.4x using only torch.compile, with reported around 1.3% speedup in Llama 3 8B transformer layers (MXFP8) and 2.6% in NVFP4.
- As a hardware-conscious optimization that reuses existing MXFP8 scales, MXNorm reduces normalization compute and improves efficiency without requiring major changes to model code.




