RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting

arXiv cs.AI / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

RS-WorldModel unifies spatiotemporal change understanding and text-guided future scene forecasting in remote sensing, enabling cross-task transfer within a single model.
The approach employs a three-stage training pipeline: Geo-Aware Generative Pre-training (GAGP), synergistic instruction tuning (SIT), and verifiable reinforcement optimization (VRO) to optimize both understanding and forecasting tasks.
The work introduces RSWBench-1.1M, a 1.1 million sample dataset with rich language annotations for both spatiotemporal understanding and forecasting tasks.
With only 2 billion parameters, RS-WorldModel surpasses open-source models up to 120× larger on most spatiotemporal change QA metrics and achieves a text-guided future scene forecasting FID of 43.13, outperforming baselines and Gemini-2.5-Flash Image.
The model promises stronger cross-task performance and efficiency in remote sensing, with potential impacts on environmental monitoring, disaster response, and geospatial analytics.

Abstract

Remote sensing world models aim to both explain observed changes and forecast plausible futures, two tasks that share spatiotemporal priors. Existing methods, however, typically address them separately, limiting cross-task transfer. We present RS-WorldModel, a unified world model for remote sensing that jointly handles spatiotemporal change understanding and text-guided future scene forecasting, and we build RSWBench-1.1M, a 1.1 million sample dataset with rich language annotations covering both tasks. RS-WorldModel is trained in three stages: (1) Geo-Aware Generative Pre-training (GAGP) conditions forecasting on geographic and acquisition metadata; (2) synergistic instruction tuning (SIT) jointly trains understanding and forecasting; (3) verifiable reinforcement optimization (VRO) refines outputs with verifiable, task-specific rewards. With only 2B parameters, RS-WorldModel surpasses open-source models up to 120

\times

larger on most spatiotemporal change question-answering metrics. It achieves an FID of 43.13 on text-guided future scene forecasting, outperforming all open-source baselines as well as the closed-source Gemini-2.5-Flash Image (Nano Banana).

Astral to Join OpenAI

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

Why Data is Important for LLM

Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

Dev.to

RS-WorldModel: a Unified Model for Remote Sensing Understanding and Future Sense Forecasting

Key Points

Abstract

Related Articles

Astral to Join OpenAI

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Why Data is Important for LLM

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer