Deconfounding Scores and Representation Learning for Causal Effect Estimation with Weak Overlap

arXiv stat.ML / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses causal effect estimation under weak overlap (a.k.a. positivity) where many estimators become high-variance and brittle when feature distributions differ greatly across treatment groups.
  • It proposes “deconfounding scores,” a representation framework that aims to preserve identification while also targeting the estimation objective, generalizing classical propensity and prognostic scores.
  • The authors formulate finding a better feature representation as minimizing an overlap divergence subject to constraints tied to deconfounding-score structure.
  • For a broad family of generalized linear models with Gaussian features, the paper derives closed-form deconfounding-score solutions and shows prognostic scores are overlap-optimal within that model class.
  • Extensive experiments are reported to evaluate the theoretical overlap behavior and practical performance of the proposed approach.

Abstract

Overlap, also known as positivity, is a key condition for causal treatment effect estimation. Many popular estimators suffer from high variance and become brittle when features differ strongly across treatment groups. This is especially challenging in high dimensions: the curse of dimensionality can make overlap implausible. To address this, we propose a class of feature representations called deconfounding scores, which preserve both identification and the target of estimation; the classical propensity and prognostic scores are two special cases. We characterize the problem of finding a representation with better overlap as minimizing an overlap divergence under a deconfounding score constraint. We then derive closed-form expressions for a class of deconfounding scores under a broad family of generalized linear models with Gaussian features and show that prognostic scores are overlap-optimal within this class. We conduct extensive experiments to assess this behavior empirically.