Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models

arXiv stat.ML / 4/17/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that classical low-dimensional intuitions fail in modern high-dimensional, overparameterized ML/DNN settings where data size, feature dimension, and parameter count are all comparable.
  • It extends Random Matrix Theory (RMT) beyond eigenvalue analysis of linear models to treat nonlinear models such as deep neural networks in the proportional high-dimensional regime.
  • The authors propose “High-dimensional Equivalent,” a framework that unifies Deterministic Equivalent and Linear Equivalent to handle high dimensionality, nonlinearity, and generic eigenspectral functionals.
  • Using this framework, the paper provides precise characterizations of both training and generalization for linear models, nonlinear shallow networks, and deep networks, explaining phenomena like scaling laws and double descent.
  • Overall, the work aims to deliver a unified theoretical lens for understanding deep learning behavior in high-dimensional regimes, including nonlinear learning dynamics.

Abstract

Modern Machine Learning (ML) and Deep Neural Networks (DNNs) often operate on high-dimensional data and rely on overparameterized models, where classical low-dimensional intuitions break down. In particular, the proportional regime where the data dimension, sample size, and number of model parameters are all large and comparable, gives rise to novel and sometimes counterintuitive behaviors. This paper extends traditional Random Matrix Theory (RMT) beyond eigenvalue-based analysis of linear models to address the challenges posed by nonlinear ML models such as DNNs in this regime. We introduce the concept of High-dimensional Equivalent, which unifies and generalizes both Deterministic Equivalent and Linear Equivalent, to systematically address three technical challenges: high dimensionality, nonlinearity, and the need to analyze generic eigenspectral functionals. Leveraging this framework, we provide precise characterizations of the training and generalization performance of linear models, nonlinear shallow networks, and deep networks. Our results capture rich phenomena, including scaling laws, double descent, and nonlinear learning dynamics, offering a unified perspective on the theoretical understanding of deep learning in high dimensions.