Invariant Feature Extraction Through Conditional Independence and the Optimal Transport Barycenter Problem: the Gaussian case

arXiv stat.ML / 5/1/2026

💬 OpinionModels & Research

Key Points

  • The paper presents a method to learn invariant features W=f(X) that predict Y while avoiding confounding effects from variables Z that affect both X and Y.
  • It replaces an intractable objective of reducing conditional dependence between W and Z given Y with a simpler independence constraint between W and a transformed variable Z_Y=T(Z,Y), where T is defined via the Monge optimal transport barycenter solution.
  • In the Gaussian setting, the authors show the conditional-independence objective and the transformed independence criterion are mathematically equivalent, without loss of generality.
  • When true confounders Z are unknown, the approach can use measurable contextual surrogates S; in the Gaussian case this substitution remains exact if the covariance matrix Σ_ZS has full range.
  • The resulting linear feature extractor has a closed-form solution using the top d eigenvectors of a known matrix, and the framework is argued to extend with minimal changes to more general non-Gaussian/non-linear settings.

Abstract

A methodology is developed to extract d invariant features W=f(X) that predict a response variable Y without being confounded by variables Z that may influence both X and Y. The methodology's main ingredient is the penalization of any statistical dependence between W and Z conditioned on Y, replaced by the more readily implementable plain independence between W and the random variable Z_Y = T(Z,Y) that solves the [Monge] Optimal Transport Barycenter Problem for Z\mid Y. In the Gaussian case considered in this article, the two statements are equivalent. When the true confounders Z are unknown, other measurable contextual variables S can be used as surrogates, a replacement that involves no relaxation in the Gaussian case if the covariance matrix \Sigma_{ZS} has full range. The resulting linear feature extractor adopts a closed form in terms of the first d eigenvectors of a known matrix. The procedure extends with little change to more general, non-Gaussian / non-linear cases.