Global Attention with Linear Complexity for Exascale Generative Data Assimilation in Earth System Prediction

arXiv cs.LG / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper presents a one-stage generative data assimilation (DA) framework that turns DA into Bayesian posterior sampling rather than the traditional forecast-update cycle.
  • It introduces STORM, a spatiotemporal transformer designed to remove the quadratic attention bottleneck by using a global-attention linear-complexity scaling algorithm.
  • The authors report strong GPU scalability on Frontier: running on 32,768 GPUs achieves 63% strong scaling efficiency and 1.6 ExaFLOP sustained performance.
  • The method is scaled up to 20 billion spatiotemporal tokens, enabling km-scale global modeling across 177k temporal frames, which the authors say was previously out of reach.
  • The work targets a key bottleneck in exascale Earth system prediction—scalable, accurate inference—aiming to improve uncertainty quantification and prediction of extreme events.

Abstract

Accurate weather and climate prediction relies on data assimilation (DA), which estimates the Earth system state by integrating observations with models. While exascale computing has significantly advanced earth simulation, scalable and accurate inference of the Earth system state remains a fundamental bottleneck, limiting uncertainty quantification and prediction of extreme events. We introduce a unified one-stage generative DA framework that reformulates assimilation as Bayesian posterior sampling, replacing the conventional forecast-update cycle with compute-dense, GPU-efficient inference. At the core is STORM, a novel spatiotemporal transformer with a global attention linear-complexity scaling algorithm that breaks the quadratic attention barrier. On 32,768 GPUs of the Frontier supercomputer, our method achieves 63% strong scaling efficiency and 1.6 ExaFLOP sustained performance. We further scale to 20 billion spatiotemporal tokens, enabling km-scale global modeling over 177k temporal frames, regimes previously unreachable, establishing a new paradigm for Earth system prediction.