Scaling Laws are Redundancy Laws
arXiv stat.ML / 2026/3/24
💬 オピニオンIdeas & Deep AnalysisModels & Research
要点
- The paper argues that deep learning scaling laws can be derived as redundancy laws rather than having an unknown mathematical origin for the scaling exponent.
- By applying kernel regression, it links the excess-risk power-law exponent to properties of the data covariance spectrum, introducing a redundancy measure (1/beta) that affects the learning-curve slope.
- The authors find that the slope of learning curves is not universal and varies with data redundancy, with steeper covariance spectra leading to faster returns to scale.
- They claim broad universality of the resulting law across boundedly invertible transformations, multimodal mixture data, finite-width approximations, and Transformer models in both NTK/linearized and feature-learning regimes.
- The work positions itself as the first rigorous finite-sample mathematical explanation that unifies empirical scaling-law observations with theoretical foundations grounded in data redundancy.




