Adaptive Block-Scaled Data Types

arXiv cs.CL / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies a key limitation of NVFP4 4-bit quantization: its error distribution can produce disproportionately large quantization errors near the maximum values within each 16-value block.
  • It proposes Adaptive Block-Scaled Data Types, notably IF4, which chooses between FP4 and INT4 per 16-value group and uses an E4M3 scale factor (encoded via a sign bit) to better match the input distribution.
  • The authors extend the same adaptive concept to other bit-widths, including IF3 and IF6, aiming to improve quantization behavior beyond fixed-format schemes.
  • Experiments on language models show that IF4 reduces loss during quantized training and improves accuracy in post-training quantization compared with existing 4-bit block-scaled formats.
  • To support deployability, the work designs and evaluates an IF4 Multiply-Accumulate (MAC) unit and provides code via the cited GitHub repository, suggesting efficient implementation in hardware accelerators.

Abstract

NVFP4 has grown increasingly popular as a 4-bit format for quantizing large language models due to its hardware support and its ability to retain useful information with relatively few bits per parameter. However, the format is not without limitations: recent work has shown that NVFP4 suffers from its error distribution, resulting in large amounts of quantization error on near-maximal values in each group of 16 values. In this work, we leverage this insight to design new Adaptive Block-Scaled Data Types that can adapt to the distribution of their input values. For four-bit quantization, our proposed IF4 (Int/Float 4) data type selects between FP4 and INT4 representations for each group of 16 values, which are then scaled by an E4M3 scale factor as is done with NVFP4. The selected data type is denoted using the scale factor's sign bit, which is currently unused in NVFP4, and we apply the same insight to design formats for other bit-widths, including IF3 and IF6. When used to quantize language models, we find that IF4 outperforms existing 4-bit block-scaled formats, achieving lower loss during quantized training and achieving higher accuracy on many tasks in post-training quantization. We additionally design and evaluate an IF4 Multiply-Accumulate (MAC) unit to demonstrate that IF4 can be implemented efficiently in next-generation hardware accelerators. Our code is available at https://github.com/mit-han-lab/fouroversix.