Is Supervised Learning Really That Different from Unsupervised?

arXiv stat.ML / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that supervised learning can be reframed as a two-stage process: first choosing model parameters via an unsupervised criterion, then incorporating the labels y into the outputs without altering the learned parameters.
  • It introduces a new model selection criterion that, unlike cross-validation, can be used even when the labels y are unavailable.
  • For linear ridge regression, the authors derive bounds on the asymptotic out-of-sample risk relative to the optimal asymptotic risk.
  • Experiments and analysis suggest that multiple methods—covering linear and kernel ridge regression, smoothing splines, k-NN, random forests, and neural networks—trained without access to y can match standard supervised counterparts in performance.
  • Overall, the results imply that the conceptual gap between supervised and unsupervised learning may be less fundamental than commonly assumed.

Abstract

We demonstrate how supervised learning can be decomposed into a two-stage procedure, where (1) all model parameters are selected in an unsupervised manner, and (2) the outputs y are added to the model, without changing the parameter values. This is achieved by a new model selection criterion that - in contrast to cross-validation - can be used also without access to y. For linear ridge regression, we bound the asymptotic out-of-sample risk of our method in terms of the optimal asymptotic risk. We also demonstrate that versions of linear and kernel ridge regression, smoothing splines, k-nearest neighbors, random forests, and neural networks, trained without access to y, perform similarly to their standard y-based counterparts. Hence, our results suggest that the difference between supervised and unsupervised learning is less fundamental than it may appear.