Causality-Encoded Diffusion Models for Interventional Sampling and Edge Inference

arXiv stat.ML / 4/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a “causality-encoded” diffusion model that integrates a known directed acyclic graph (DAG) by training conditional diffusion models aligned with the graph’s factorization.
  • It provides an interventional sampling mechanism where intervened variables are fixed and causal effects are propagated through the DAG during reverse diffusion, aiming to recover both observational and interventional distributions.
  • The authors develop a resampling-based statistical test for directed edges that produces null replicates under a candidate graph, along with theoretical convergence and type I error control guarantees.
  • Experiments and an application to flow cytometry data show improved recovery of interventional distributions versus baselines and practical performance in evaluating disputed signalling connections.
  • Theoretical results indicate that estimation rates depend on the maximum local dimension rather than the ambient dimension, which supports more favorable scaling properties.
  • Point 2
  • Point 3

Abstract

Standard diffusion models are flexible estimators of complex distributions, but they do not encode causal structures and therefore do not by themselves support causal analysis. We propose a causality-encoded diffusion framework that incorporates a known directed acyclic graph by training conditional diffusion models consistent with the graph factorisation. The resulting sampler approximately recovers the observational distribution and enables interventional sampling by fixing intervened variables while propagating effects through the graph during reverse diffusion. Building on this interventional simulator, we develop a resampling-based test for directed edges that generates null replicates under a candidate graph. We establish convergence guarantees for observational and interventional distribution estimation, with rates governed by the maximum local dimension rather than the ambient dimension, and prove asymptotic control of type I error for the edge test. Simulations show improved interventional distribution recovery relative to baselines, with near-nominal size and favourable power in inference. An application to flow cytometry data demonstrates practical utility of the proposed method in assessing disputed signalling linkages.