A Multi-Agent Feedback System for Detecting and Describing News Events in Satellite Imagery

arXiv cs.CV / 4/15/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that while bi-temporal change captioning exists, there is a lack of multi-temporal satellite event captioning datasets that use at least two images per sequence, largely due to search and labeling costs.
  • It introduces SkyScraper, an iterative multi-agent workflow that geocodes news articles and then synthesizes captions for matching multi-temporal satellite imagery.
  • Experiments indicate SkyScraper can find about 5× more events than traditional geocoding methods, suggesting that agentic feedback helps surface relevant new events.
  • The authors apply the system to a large corpus of global news and curate a new dataset with 5,000 multi-temporal captioning sequences.
  • The work positions automated imagery-event linkage and captioning as a support tool for journalism and reporting by identifying relevant satellite evidence for news events.

Abstract

Changes in satellite imagery often occur over multiple time steps. Despite the emergence of bi-temporal change captioning datasets, there is a lack of multi-temporal event captioning datasets (at least two images per sequence) in remote sensing. This gap exists because (1) searching for visible events in satellite imagery and (2) labeling multi-temporal sequences require significant time and labor. To address these challenges, we present SkyScraper, an iterative multi-agent workflow that geocodes news articles and synthesizes captions for corresponding satellite image sequences. Our experiments show that SkyScraper successfully finds 5x more events than traditional geocoding methods, demonstrating that agentic feedback is an effective strategy for surfacing new multi-temporal events in satellite imagery. We apply our framework to a large database of global news articles, curating a new multi-temporal captioning dataset with 5,000 sequences. By automatically identifying imagery related to news events, our work also supports journalism and reporting efforts.