Invariant Feature Extraction Through Conditional Independence and the Optimal Transport Barycenter Problem: the Gaussian case

arXiv stat.ML / 5/1/2026

💬 OpinionModels & Research

共有:

Key Points

The paper presents a method to learn invariant features W=f(X) that predict Y while avoiding confounding effects from variables Z that affect both X and Y.
It replaces an intractable objective of reducing conditional dependence between W and Z given Y with a simpler independence constraint between W and a transformed variable Z_Y=T(Z,Y), where T is defined via the Monge optimal transport barycenter solution.
In the Gaussian setting, the authors show the conditional-independence objective and the transformed independence criterion are mathematically equivalent, without loss of generality.
When true confounders Z are unknown, the approach can use measurable contextual surrogates S; in the Gaussian case this substitution remains exact if the covariance matrix Σ_ZS has full range.
The resulting linear feature extractor has a closed-form solution using the top d eigenvectors of a known matrix, and the framework is argued to extend with minimal changes to more general non-Gaussian/non-linear settings.

A methodology is developed to extract

d

invariant features

W=f(X)

that predict a response variable

Y

without being confounded by variables

Z

that may influence both

X

and

Y

. The methodology's main ingredient is the penalization of any statistical dependence between

W

and

Z

conditioned on

Y

, replaced by the more readily implementable plain independence between

W

and the random variable

Z_Y = T(Z,Y)

that solves the [Monge] Optimal Transport Barycenter Problem for

Z\mid Y

. In the Gaussian case considered in this article, the two statements are equivalent. When the true confounders

Z

are unknown, other measurable contextual variables

S

can be used as surrogates, a replacement that involves no relaxation in the Gaussian case if the covariance matrix

\Sigma_{ZS}

has full range. The resulting linear feature extractor adopts a closed form in terms of the first

d

eigenvectors of a known matrix. The procedure extends with little change to more general, non-Gaussian / non-linear cases.

Dev.to

THE DECODER

The Register

Reddit r/LocalLLaMA

Reddit r/MachineLearning