TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation
arXiv cs.CV / 3/20/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- TerraScope introduces a unified vision-language model that achieves pixel-grounded geospatial reasoning for Earth observation.
- It supports modality-flexible reasoning, fusing optical and SAR inputs when both are available and handling single-modality inputs when needed.
- It enables multi-temporal reasoning by integrating sequences across time for change analysis.
- The Terra-CoT dataset contains 1 million samples with pixel-level masks embedded in reasoning chains, and TerraScope-Bench provides six sub-tasks to evaluate both answer accuracy and mask quality.
- Experiments show TerraScope significantly outperforms existing VLMs and provides interpretable visual evidence, signaling a potential shift in EO multi-modal analytics.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.



