Intervening to Learn and Compose Causally Disentangled Representations

arXiv stat.ML / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that generative models do not necessarily need to choose between high expressivity and structured latent representations.
  • It introduces a “context module” that can be added to an otherwise arbitrary black-box generative model to learn causally disentangled concepts.
  • The method is inspired by causal intervention, using selectively modified architecture during training to learn compact joint representations across different contexts.
  • The authors report that the learned representations support compositional out-of-distribution (OOD) generation on both real and simulated datasets.
  • They provide theoretical support via a new identifiability result extending prior work on recovering structured representations.

Abstract

In designing generative models, it is commonly believed that in order to learn useful latent structure, we face a fundamental tension between expressivity and structure. In this paper we challenge this view by proposing a new approach to training arbitrarily expressive generative models that simultaneously learn causally disentangled concepts. This is accomplished by adding a simple context module to an arbitrarily complex black-box model, which learns to process concept information by implicitly inverting linear representations from the model's encoder. Inspired by the notion of intervention in a causal model, our module selectively modifies its architecture during training, allowing it to learn a compact joint model over different contexts. We show how adding this module leads to causally disentangled representations that can be composed for out-of-distribution generation on both real and simulated data. The resulting models can be trained end-to-end or fine-tuned from pre-trained models. To further validate our proposed approach, we prove a new identifiability result that extends existing work on identifying structured representations.