AI Navigate

AGCD: Agent-Guided Cross-Modal Decoding for Weather Forecasting

arXiv cs.AI / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • AGCD is a decoding-time prior-injection framework that injects state-conditioned physics priors into weather forecasters to improve autoregressive stability and physical consistency.
  • It builds a multi-agent meteorological narration pipeline that uses multi-modal large language models (MLLMs) to extract diverse meteorological elements and derive priors from the current atmospheric state.
  • The method introduces cross-modal region interaction decoding with region-aware multi-scale tokenization to refine visual features without changing the forecaster backbone interface.
  • Experiments on WeatherBench show consistent gains for 6-hour forecasts across two resolutions and backbones, including strictly causal 48-hour autoregressive rollouts with reduced early-stage error growth.

Abstract

Accurate weather forecasting is more than grid-wise regression: it must preserve coherent synoptic structures and physical consistency of meteorological fields, especially under autoregressive rollouts where small one-step errors can amplify into structural bias. Existing physics-priors approaches typically impose global, once-for-all constraints via architectures, regularization, or NWP coupling, offering limited state-adaptive and sample-specific controllability at deployment. To bridge this gap, we propose Agent-Guided Cross-modal Decoding (AGCD), a plug-and-play decoding-time prior-injection paradigm that derives state-conditioned physics-priors from the current multivariate atmosphere and injects them into forecasters in a controllable and reusable way. Specifically, We design a multi-agent meteorological narration pipeline to generate state-conditioned physics-priors, utilizing MLLMs to extract various meteorological elements effectively. To effectively apply the priors, AGCD further introduce cross-modal region interaction decoding that performs region-aware multi-scale tokenization and efficient physics-priors injection to refine visual features without changing the backbone interface. Experiments on WeatherBench demonstrate consistent gains for 6-hour forecasting across two resolutions (5.625 degree and 1.40625 degree) and diverse backbones (generic and weather-specialized), including strictly causal 48-hour autoregressive rollouts that reduce early-stage error accumulation and improve long-horizon stability.