The Causal Description Gap: Information-Theoretic Separations Across Pearl's Hierarchy

arXiv stat.ML / 5/5/2026

📰 NewsModels & Research

Key Points

  • The paper studies a quantitative version of Pearl’s causal hierarchy by asking how many extra bits are required to describe higher-rung causal answers when lower-rung answers are already known.
  • It introduces “query-class description length,” using Kolmogorov complexity of an answer oracle induced by a structural causal model (SCM) for a class of queries.
  • The authors construct binary acyclic SCMs where the observational distribution has constant description length, but the single-variable interventional answer oracle requires Θ(n²) bits, yielding an order-optimal quadratic separation in dense regimes.
  • They provide degree-sensitive upper bounds for finite-gate-schema SCMs (depending on indegree d), showing the observational–interventional gap is at most O(nd log(en/d) + n log n), and demonstrate robustness of the quadratic gap under ε-accurate total-variation descriptions.
  • The work further characterizes higher-rung counterfactual gaps (remaining Θ(n)) and relates these information gaps to residual ambiguity via an ambiguity-to-bits theorem and a Shannon-style analogue.

Abstract

Pearl's causal hierarchy shows that observational, interventional, and counterfactual queries are qualitatively distinct. We ask a quantitative version of this question: how many additional bits are needed to specify higher-rung causal answers once lower-rung answers are known? We formalize this via query-class description length, the Kolmogorov complexity of the answer oracle induced by an SCM for a class of queries. Our main construction gives binary acyclic SCMs whose observational distribution has constant description length, while the single-variable interventional answer oracle has description length \Theta(n^2). A degree-sensitive upper bound shows that finite-gate-schema SCMs of indegree d have observational-interventional gap at most O(nd \log(en/d) + n \log n), making the quadratic construction order-optimal in the dense regime and a rooted-tree construction order-optimal for bounded indegree. The quadratic separation persists under \varepsilon-accurate total-variation descriptions for every fixed \varepsilon < 1/4. At the next rung, the full hard-do interventional oracle can still leave a \Theta(n) counterfactual description gap. A general ambiguity-to-bits theorem and Shannon analogue show that these gaps equal the logarithm of residual higher-rung ambiguity up to lower-order terms.