CycloneMAE: A Scalable Multi-Task Learning Model for Global Tropical Cyclone Probabilistic Forecasting

arXiv cs.LG / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces CycloneMAE, a scalable multi-task learning model aimed at global tropical cyclone probabilistic forecasting by learning transferable representations from multi-modal data.
  • It uses a TC structure-aware masked autoencoder together with a pre-train/fine-tune setup and a discrete probabilistic gridding mechanism to output both deterministic forecasts and calibrated probability distributions.
  • Across five ocean basins, CycloneMAE reportedly outperforms leading NWP systems for pressure and wind forecasts up to 120 hours and for track forecasts up to 24 hours.
  • An attribution study using integrated gradients suggests the model’s decision-making is physically interpretable, with short-term predictions focusing on the cyclone’s internal convective core and longer-term forecasts shifting toward environmental factors.
  • The authors position the framework as a pathway toward operationally useful, probabilistic, interpretable, and scalable TC forecasting.

Abstract

Tropical cyclones (TCs) rank among the most destructive natural hazards, yet their forecasting faces fundamental trade-offs: numerical weather prediction (NWP) models are computationally prohibitive and struggle to leverage historical data, while existing deep learning (DL)-based intelligent models are variable-specific and deterministic, which fail to generalize across different forecasting variables. Here we present CycloneMAE, a scalable multi-task forecasting model that learns transferable TC representations from multi-modal data using a TC structure-aware masked autoencoder. By coupling a discrete probabilistic gridding mechanism with a pre-train/fine-tune paradigm, CycloneMAE simultaneously delivers deterministic forecasts and probability distributions. Evaluated across five global ocean basins, CycloneMAE outperforms leading NWP systems in pressure and wind forecasting up to 120 hours and in track forecasting up to 24 hours. Attribution analysis via integrated gradients reveals physically interpretable learning dynamics: short-term forecasts rely predominantly on the internal core convective structure from satellite imagery, whereas longer-term forecasts progressively shift attention to external environmental factors. Our framework establishes a scalable, probabilistic, and interpretable pathway for operational TC forecasting.