Skillful Global Ocean Emulation and the Role of Correlation-Aware Loss

arXiv cs.AI / 4/22/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study adapts GraphCast into an ocean-only machine learning emulator that forecasts global ocean dynamics using prescribed atmospheric conditions for medium-range lead times.
  • Trained on NOAA’s UFS-Replay dataset with 24-hour steps and a single initial condition, the emulator is designed to work without autoregressive training and achieves forecast skill for 10–15 day horizons.
  • The researchers show that using a correlation-aware Mahalanobis-distance loss improves forecast accuracy over Mean Squared Error by explicitly modeling correlations among predicted variables’ tendencies.
  • Spatial correlation analyses suggest the correlation-aware loss functions as a statistical-dynamical regularizer, strengthening slow, correlated ocean dynamics and improving downstream use cases such as data assimilation.

Abstract

Machine learning emulators have shown extraordinary skill in forecasting atmospheric states, and their application to global ocean dynamics offers similar promise. Here, we adapt the GraphCast architecture into a dedicated ocean-only emulator, driven by prescribed atmospheric conditions, for medium-range predictions. The emulator is trained on NOAA's UFS-Replay dataset. Using a 24 hour time step, single initial condition, and without using autoregressive training, we produce an emulator that provides skillful forecasts for 10-15 day lead times. We further demonstrate the use of Mahalanobis distance as loss that improves the forecast skill compared to the Mean Squared Error loss by explicitly accounting for the correlations between tendencies of the target variables. Using spatial correlation analysis of the forecasted fields, we also show that the proposed correlation-aware loss acts as a statistical-dynamical regularizer for the slow, correlated dynamics of the global oceans, offering a better background forecast for downstream tasks like data assimilation.