Understanding Self-Supervised Learning via Latent Distribution Matching

arXiv cs.LG / 5/6/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a unifying theoretical framework for self-supervised learning (SSL) by casting it as Latent Distribution Matching (LDM).
  • In LDM, representations are learned to maximize log-probability under an assumed latent model (alignment) while also maximizing latent entropy to prevent representation collapse (uniformity).
  • The framework unifies multiple SSL families—independent component analysis, contrastive/non-contrastive SSL, predictive SSL, and stop-gradient methods—under a single viewpoint.
  • Using LDM, the authors derive a nonlinear, sampling-free Bayesian filtering model with a Kalman-style predictor for high-dimensional time series.
  • The paper also proves that predictive LDM can produce identifiable latent representations under mild assumptions, even when using nonlinear predictors.

Abstract

Self-supervised learning (SSL) excels at finding general-purpose latent representations from complex data, yet lacks a unifying theoretical framework that explains the diverse existing methods and guides the design of new ones. We cast SSL as latent distribution matching (LDM): learning representations that maximize their log-probability under an assumed latent model (alignment), while maximizing latent entropy to prevent collapse (uniformity). This view unifies independent component analysis with contrastive, non-contrastive, and predictive SSL methods, including stop gradient approaches. Leveraging LDM, we derive a nonlinear, sampling-free Bayesian filtering model with a Kalman-based predictor for high-dimensional timeseries. We further prove that predictive LDM yields identifiable latent representations under mild assumptions, even with nonlinear predictors. Overall, LDM clarifies the assumptions behind established SSL methods and provides principled guidance for developing new approaches.