Scaling Laws are Redundancy Laws
arXiv stat.ML / 3/24/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that deep learning scaling laws can be derived as redundancy laws rather than having an unknown mathematical origin for the scaling exponent.
- By applying kernel regression, it links the excess-risk power-law exponent to properties of the data covariance spectrum, introducing a redundancy measure (1/beta) that affects the learning-curve slope.
- The authors find that the slope of learning curves is not universal and varies with data redundancy, with steeper covariance spectra leading to faster returns to scale.
- They claim broad universality of the resulting law across boundedly invertible transformations, multimodal mixture data, finite-width approximations, and Transformer models in both NTK/linearized and feature-learning regimes.
- The work positions itself as the first rigorous finite-sample mathematical explanation that unifies empirical scaling-law observations with theoretical foundations grounded in data redundancy.
Related Articles
How AI is Transforming Dynamics 365 Business Central
Dev.to
Algorithmic Gaslighting: A Formal Legal Template to Fight AI Safety Pivots That Cause Psychological Harm
Reddit r/artificial
Do I need different approaches for different types of business information errors?
Dev.to
ShieldCortex: What We Learned Protecting AI Agent Memory
Dev.to
How AI-Powered Revenue Intelligence Transforms B2B Sales Teams
Dev.to