Retrieving Climate Change Disinformation by Narrative

arXiv cs.CL / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that detecting climate change disinformation using fixed narrative taxonomies fails when new narratives emerge, so it reframes detection as a retrieval problem rather than a closed-set classification task.
  • It proposes SpecFi, a framework that generates hypothetical documents to connect abstract narrative descriptions to concrete text instances using graph-based community summary few-shot examples.
  • Repurposing existing climate disinformation datasets (CARDS, Climate Obstruction, and a PolyNarrative subset) for retrieval evaluation, the authors report a MAP of 0.505 on CARDS without using narrative labels.
  • The study introduces narrative variance as an embedding-based difficulty metric and finds that standard retrieval degrades substantially on high-variance narratives (e.g., BM25 loses 63.4% MAP), while SpecFi-CS is more robust (32.7% loss).
  • It also shows that unsupervised community summaries can converge toward expert-like taxonomy descriptions, suggesting graph methods can recover narrative structure from unlabeled text.

Abstract

Detecting climate disinformation narratives typically relies on fixed taxonomies, which do not accommodate emerging narratives. Thus, we re-frame narrative detection as a retrieval task: given a narrative's core message as a query, rank texts from a corpus by alignment with that narrative. This formulation requires no predefined label set and can accommodate emerging narratives. We repurpose three climate disinformation datasets (CARDS, Climate Obstruction, climate change subset of PolyNarrative) for retrieval evaluation and propose SpecFi, a framework that generates hypothetical documents to bridge the gap between abstract narrative descriptions and their concrete textual instantiations. SpecFi uses community summaries from graph-based community detection as few-shot examples for generation, achieving a MAP of 0.505 on CARDS without access to narrative labels. We further introduce narrative variance, an embedding-based difficulty metric, and show via partial correlation analysis that standard retrieval degrades on high-variance narratives (BM25 loses 63.4% of MAP), while SpecFi-CS remains robust (32.7% loss). Our analysis also reveals that unsupervised community summaries converge on descriptions close to expert-crafted taxonomies, suggesting that graph-based methods can surface narrative structure from unlabeled text.