(Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models

arXiv cs.LG / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces Mosaic, a probabilistic ML weather forecasting model designed to prevent spectral degradation caused by training against ensemble means and by compressive encoding bottlenecks.
  • Mosaic generates ensemble members using learned functional perturbations and runs on native-resolution grids with block-sparse attention to capture long-range dependencies efficiently.
  • With 214M parameters at 1.5° resolution, Mosaic reportedly matches or surpasses models trained on data at six times finer resolution for key upper-air variables.
  • The model achieves state-of-the-art performance among 1.5° models while producing well-calibrated ensembles whose members maintain near-perfect spectral fidelity across resolved frequencies.
  • Runtime is claimed to be fast: a 24-member, 10-day forecast completes in under 12 seconds on a single NVIDIA H100 GPU.

Abstract

We introduce Mosaic, a probabilistic weather forecasting model that addresses two principal sources of spectral degradation in ML-based weather prediction: (1) deterministic training against ensemble means and (2) compressive encoding creating an information bottleneck. Mosaic generates ensemble members through learned functional perturbations and operates on native-resolution grids via block-sparse attention, a hardware-aligned mechanism that captures long-range dependencies at linear cost by sharing keys and values across spatially adjacent queries. At 1.5\deg resolution with 214M parameters, Mosaic matches or outperforms models trained on 6 times finer data on headline upper-air variables and achieves state-of-the-art results among 1.5\deg models, producing well-calibrated ensembles whose individual members exhibit near-perfect spectral alignment across all resolved frequencies. A 24-member, 10-day forecast takes under 12 seconds on a single H100 GPU.