TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

arXiv cs.CV / 3/20/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

TerraScope introduces a unified vision-language model that achieves pixel-grounded geospatial reasoning for Earth observation.
It supports modality-flexible reasoning, fusing optical and SAR inputs when both are available and handling single-modality inputs when needed.
It enables multi-temporal reasoning by integrating sequences across time for change analysis.
The Terra-CoT dataset contains 1 million samples with pixel-level masks embedded in reasoning chains, and TerraScope-Bench provides six sub-tasks to evaluate both answer accuracy and mask quality.
Experiments show TerraScope significantly outperforms existing VLMs and provides interpretable visual evidence, signaling a potential shift in EO multi-modal analytics.

Abstract

Vision-language models (VLMs) have shown promise in earth observation (EO), yet they struggle with tasks that require grounding complex spatial reasoning in precise pixel-level visual representations. To address this problem, we introduce TerraScope, a unified VLM that delivers pixel-grounded geospatial reasoning with two key capabilities: (1) modality-flexible reasoning: it handles single-modality inputs (optical or SAR) and adaptively fuses different modalities into the reasoning process when both are available; (2) multi-temporal reasoning: it integrates temporal sequences for change analysis across multiple time points. In addition, we curate Terra-CoT, a large-scale dataset containing 1 million samples with pixel-level masks embedded in reasoning chains across multiple sources. We also propose TerraScope-Bench, the first benchmark for pixel-grounded geospatial reasoning with six sub-tasks that evaluates both answer accuracy and mask quality to ensure authentic pixel-grounded reasoning. Experiments show that TerraScope significantly outperforms existing VLMs on pixel-grounded geospatial reasoning while providing interpretable visual evidence.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 3/20DailyView insight →

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

日経XTECH

How to Build an AI Team: The Solopreneur Playbook

Dev.to

CrewAI vs AutoGen vs LangGraph: Which Agent Framework to Use

Dev.to

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026

Dev.to

Top Web Development Trends in 2026

Dev.to

TerraScope: Pixel-Grounded Visual Reasoning for Earth Observation

Key Points

Abstract

💡 Insights using this article

Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成

How to Build an AI Team: The Solopreneur Playbook

CrewAI vs AutoGen vs LangGraph: Which Agent Framework to Use

14 Best Self-Hosted Claude Alternatives for AI and Coding in 2026

Top Web Development Trends in 2026

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer