On the Number of Conditional Independence Tests in Constraint-based Causal Discovery

arXiv stat.ML / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses the high cost of conditional independence testing in constraint-based causal discovery, noting that common algorithms like PC can require exponentially many tests in the worst case.
  • It proposes a new constraint-based algorithm with improved worst-case complexity, achieving a number of conditional independence tests scaling as p^O(s), where p is the number of variables and s is the maximum undirected clique size of the essential graph.
  • The authors prove a lower bound showing that any constraint-based causal discovery algorithm must use at least 2^Ω(s) conditional independence tests, indicating the proposed method is near exponent-optimal (up to a logarithmic factor).
  • Simulations and experiments on semi-synthetic gene-expression data and real-world datasets support the theoretical claims, showing fewer conditional independence tests than existing methods.
  • Overall, the work clarifies the theoretical limits and achievable complexity for constraint-based causal discovery without adding stronger assumptions beyond the framework considered.

Abstract

Learning causal relations from observational data is a fundamental problem with wide-ranging applications across many fields. Constraint-based methods infer the underlying causal structure by performing conditional independence tests. However, existing algorithms such as the prominent PC algorithm need to perform a large number of independence tests, which in the worst case is exponential in the maximum degree of the causal graph. Despite extensive research, it remains unclear if there exist algorithms with better complexity without additional assumptions. Here, we establish an algorithm that achieves a better complexity of p^{\mathcal{O}(s)} tests, where p is the number of nodes in the graph and s denotes the maximum undirected clique size of the underlying essential graph. Complementing this result, we prove that any constraint-based algorithm must perform at least 2^{\Omega(s)} conditional independence tests, establishing that our proposed algorithm achieves exponent-optimality up to a logarithmic factor in terms of the number of conditional independence tests needed. Finally, we validate our theoretical findings through simulations, on semi-synthetic gene-expression data, and real-world data, demonstrating the efficiency of our algorithm compared to existing methods in terms of number of conditional independence tests needed.