Causal-Audit: A Framework for Risk Assessment of Assumption Violations in Time-Series Causal Discovery

arXiv cs.LG / 4/6/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • Causal-Audit proposes a framework for time-series causal discovery risk assessment when key assumptions (stationarity, regular sampling, bounded temporal dependence, etc.) may be violated, which can otherwise yield confident but incorrect causal graphs.
  • The method computes calibrated effect-size diagnostics for five assumption families—stationarity, irregularity, persistence, nonlinearity, and confounding proxies—and aggregates them into four risk scores with uncertainty intervals.
  • It includes an abstention-aware decision policy that recommends specific causal discovery methods only when evidence supports reliable inference, and otherwise opts to abstain to avoid misleading results.
  • Experiments on a synthetic atlas of 500 DGPs show strong calibration (AUROC > 0.95), a 62% reduction in false positives among recommended datasets, and 78% abstention on severe violations.
  • The framework’s recommend-or-abstain behavior is validated across 21 external evaluations (TimeGraph and CausalTime) and is provided with an open-source implementation.

Abstract

Time-series causal discovery methods rely on assumptions such as stationarity, regular sampling, and bounded temporal dependence. When these assumptions are violated, structure learning can produce confident but misleading causal graphs without warning. We introduce Causal-Audit, a framework that formalizes assumption validation as calibrated risk assessment. The framework computes effect-size diagnostics across five assumption families (stationarity, irregularity, persistence, nonlinearity, and confounding proxies), aggregates them into four calibrated risk scores with uncertainty intervals, and applies an abstention-aware decision policy that recommends methods (e.g., PCMCI+, VAR-based Granger causality) only when evidence supports reliable inference. The semi-automatic diagnostic stage can also be used independently for structured assumption auditing in individual studies. Evaluation on a synthetic atlas of 500 data-generating processes (DGPs) spanning 10 violation families demonstrates well-calibrated risk scores (AUROC > 0.95), a 62% false positive reduction among recommended datasets, and 78% abstention on severe-violation cases. On 21 external evaluations from TimeGraph (18 categories) and CausalTime (3 domains), recommend-or-abstain decisions are consistent with benchmark specifications in all cases. An open-source implementation of our framework is available.