Consistent Bayesian causal discovery for structural equation models with equal error variances

arXiv stat.ML / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies Bayesian causal discovery for linear acyclic SEMs where error terms are independent but not necessarily Gaussian, with the key identifiability assumption that all error variances are equal.
  • It proves a characterization result: the total minimum expected squared prediction error for all variables (using best linear parent combinations) is minimized exactly by graphs that are supergraphs of the true causal DAG.
  • Based on this property, the authors propose a Bayesian DAG selection approach that uses a working Gaussian SEM with equal error variances and independent g-priors over SEM coefficients.
  • They show the method is consistently able to recover the true causal graph without requiring extra distributional assumptions beyond the stated equal-variance and independence conditions, supported by simulation experiments.

Abstract

We consider the problem of recovering the true causal structure among a set of variables, generated by a linear acyclic structural equation model (SEM) with the error terms being independent, not necessarily Gaussian, and having equal variances. It is well-known that the true underlying directed acyclic graph (DAG) encoding the causal structure is uniquely identifiable under this assumption. Interestingly, in this setting, it further holds that the sum of minimum expected squared errors for every variable, while predicted by the best linear combination of its parent variables, is minimised if and only if the causal structure is represented by any supergraph of the true DAG. In this work, we propose a Bayesian DAG selection method, where the working model assumes Gaussian SEM with equal error variances, and employ independent g-priors on each set of SEM coefficients. Furthermore, we utilise the aforementioned key property to establish that the proposed method recovers the true graph consistently without any additional distributional assumption, and illustrate it with a simulation study.