Deep Neural Regression Collapse

arXiv cs.LG / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper extends the concept of Neural Collapse from classification to regression, showing that Neural Regression Collapse (NRC) occurs not only at the last layer but also throughout earlier layers in deep neural regression models.
  • It provides evidence that, in the “collapsed” layers, learned features and covariances align with the target’s dimensionality and covariance structure, and that the layer weights’ input subspace matches the feature subspace.
  • The authors demonstrate that the linear prediction error of features in collapsed layers closely matches the model’s overall prediction error, indicating the internal representation closely supports the model’s predictions.
  • They further show that models exhibiting Deep NRC learn the intrinsic dimension of low-rank targets and analyze the role and necessity of weight decay in inducing Deep NRC.
  • Overall, the work delivers a more complete, multi-layer characterization of the simple structure deep networks can learn in regression settings.

Abstract

Neural Collapse is a phenomenon that helps identify sparse and low rank structures in deep classifiers. Recent work has extended the definition of neural collapse to regression problems, albeit only measuring the phenomenon at the last layer. In this paper, we establish that Neural Regression Collapse (NRC) also occurs below the last layer across different types of models. We show that in the collapsed layers of neural regression models, features lie in a subspace that corresponds to the target dimension, the feature covariance aligns with the target covariance, the input subspace of the layer weights aligns with the feature subspace, and the linear prediction error of the features is close to the overall prediction error of the model. In addition to establishing Deep NRC, we also show that models that exhibit Deep NRC learn the intrinsic dimension of low rank targets and explore the necessity of weight decay in inducing Deep NRC. This paper provides a more complete picture of the simple structure learned by deep networks in the context of regression.