ClimateCause: Complex and Implicit Causal Structures in Climate Reports

arXiv cs.CL / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces ClimateCause, a new expert-annotated dataset built from science-for-policy climate reports to capture complex, higher-order causal structures beyond what existing datasets provide.
  • It normalizes and disentangles cause-effect expressions into individual causal relations, while adding metadata for correlation type, relation type, and spatiotemporal context to support graph construction.
  • The authors show that ClimateCause can be used to quantify the readability of climate statements by linking it to the semantic complexity of the causal graphs.
  • Benchmarking with large language models indicates that causal chain reasoning is a particularly challenging problem compared with correlation inference on this dataset.
  • Overall, the dataset and experiments aim to improve causal reasoning and evaluation for climate-change understanding and science-to-policy communication.

Abstract

Understanding climate change requires reasoning over complex causal networks. Yet, existing causal discovery datasets predominantly capture explicit, direct causal relations. We introduce ClimateCause, a manually expert-annotated dataset of higher-order causal structures from science-for-policy climate reports, including implicit and nested causality. Cause-effect expressions are normalized and disentangled into individual causal relations to facilitate graph construction, with unique annotations for cause-effect correlation, relation type, and spatiotemporal context. We further demonstrate ClimateCause's value for quantifying readability based on the semantic complexity of causal graphs underlying a statement. Finally, large language model benchmarking on correlation inference and causal chain reasoning highlights the latter as a key challenge.