Learning When the Concept Shifts: Confounding, Invariance, and Dimension Reduction

arXiv stat.ML / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses domain adaptation under distribution shifts where unobserved confounding can change the optimal “concept” or model for prediction.
  • It introduces a linear structural causal model to handle endogeneity and uses invariant covariate representations to prevent concept shifts and improve target-domain risk.
  • The authors propose a representation learning approach that finds a lower-dimensional linear subspace and restricts the predictor to that subspace, balancing predictability versus stability.
  • Optimization is formulated as a constrained non-convex problem over the Stiefel manifold and is solved using a projected-gradient-style method, with analysis of the optimization landscape.
  • The work provides theory showing that with sufficient regularization most local optima correspond to invariant subspaces that are resilient to distribution shifts, and it validates the approach on real datasets.

Abstract

Practitioners often face the challenge of deploying prediction models in new environments with shifted distributions of covariates and responses. With observational data, such shifts are often driven by unobserved confounding, and can in fact alter the concept of which model is best. This paper studies distribution shifts in the domain adaptation problem with unobserved confounding. We postulate a linear structural causal model to account for endogeneity and unobserved confounding, and we leverage exogenous invariant covariate representations to cure concept shifts and improve target prediction. We propose a data-driven representation learning method that optimizes for a lower-dimensional linear subspace and a prediction model confined to that subspace. This method operates on a non-convex objective -- that interpolates between predictability and stability -- constrained to the Stiefel manifold, using an analog of projected gradient descent. We analyze the optimization landscape and prove that, provided sufficient regularization, nearly all local optima align with an invariant linear subspace resilient to distribution shifts. This method achieves a nearly ideal gap between target and source risk. We validate the method and theory with real-world data sets to illustrate the tradeoffs between predictability and stability.