Benchmarking Physics-Informed Time-Series Models for Operational Global Station Weather Forecasting

arXiv stat.ML / 4/1/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces WEATHER-5K, a larger and more realistic observational dataset aimed at improving time-series forecasting benchmarks for operational Global Station Weather Forecasting, where prior data were limited in size and spatiotemporal coverage.
  • It proposes PhysicsFormer, a physics-informed forecasting model that integrates a dynamic core with a Transformer residual, using losses for pressure–wind alignment and energy-aware smoothness to enforce physical consistency.
  • The authors benchmark PhysicsFormer and other time-series forecasting (TSF) models against operational Numerical Weather Prediction systems across multiple weather variables, extreme-event prediction, and varying model complexity.
  • The results are framed around closing (or at least quantifying) the performance gap between academic TSF methods and operational forecasting systems, especially for complex dynamics and extremes.
  • The dataset and benchmark implementation are released via GitHub, enabling reproducible evaluation and further research on operationally relevant forecasting.

Abstract

The development of Time-Series Forecasting (TSF) models is often constrained by the lack of comprehensive datasets, especially in Global Station Weather Forecasting (GSWF), where existing datasets are small, temporally short, and spatially sparse. To address this, we introduce WEATHER-5K, a large-scale observational weather dataset that better reflects real-world conditions, supporting improved model training and evaluation. While recent TSF methods perform well on benchmarks, they lag behind operational Numerical Weather Prediction systems in capturing complex weather dynamics and extreme events. We propose PhysicsFormer, a physics-informed forecasting model combining a dynamic core with a Transformer residual to predict future weather states. Physical consistency is enforced via pressure-wind alignment and energy-aware smoothness losses, ensuring plausible dynamics while capturing complex temporal patterns. We benchmark PhysicsFormer and other TSF models against operational systems across several weather variables, extreme event prediction, and model complexity, providing a comprehensive assessment of the gap between academic TSF models and operational forecasting. The dataset and benchmark implementation are available at: https://github.com/taohan10200/WEATHER-5K.