From Local to Cluster: A Unified Framework for Causal Discovery with Latent Variables

arXiv cs.AI / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses how latent variables complicate causal discovery, noting that existing local and cluster-level approaches each have key limitations (e.g., needing known clusters or assuming causal sufficiency).
  • It introduces L2C (Local to Cluster Causal Abstraction), which automatically discovers how micro variables should be partitioned into clusters using local causal patterns rather than requiring manual cluster assignments.
  • L2C uses a cluster reduction theorem to compress each cluster to at most three nodes without losing causal information, then performs local causal discovery to learn direct causes, effects, and V-structures under latent-variable settings.
  • At the macro level, it constructs a cluster graph and applies cluster-level calculus to perform causal inference, explicitly avoiding causal-sufficiency assumptions by handling latent variables locally.
  • Theoretical results claim soundness, atomic completeness, and computational efficiency, and experiments on both synthetic and real data report accurate cluster recovery and improved macro causal effect identification versus baselines.

Abstract

Latent variables pose a fundamental challenge to causal discovery and inference. Conventional local methods focus on direct neighbors but fail to provide macro level insights. Cluster level methods enable macro causal reasoning but either assume clusters are known a priori or require causal sufficiency. Moreover, directly applying single variable causal discovery methods to cluster level problems violates causal sufficiency and leads to incorrect results. To overcome these limitations, this paper proposes L2C (Local to Cluster Causal Abstraction), a unified framework that bridges local structure learning and cluster level causal discovery. Unlike prior work that requires a complete manual assignment of micro variables to clusters, L2C discovers the partition automatically from local causal patterns. Our solution leverages a cluster reduction theorem to reduce any cluster to at most three nodes without loss of causal information, applies local causal discovery to identify direct causes, effects, and V structures in the presence of latent variables, and performs macro level causal inference via cluster level calculus on the learned cluster graph. L2C does not assume causal sufficiency, as latent variables are handled through local discovery. Theoretical analysis shows that L2C ensures soundness, atomic completeness, and computational efficiency. Extensive experiments on synthetic and real world data demonstrate that L2C accurately recovers ground truth clusters and achieves superior macro causal effect identification compared to existing baselines.