Extreme Weather Bench: A framework and benchmark for evaluation of high-impact weather

arXiv cs.LG / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • Extreme Weather Bench (EWB) is introduced as a new community-driven benchmark suite to evaluate AI and numerical weather prediction (NWP) models on high-impact weather events.
  • It offers standardized case studies across multiple spatial and temporal scales, along with observational data, impact-based metrics, and open-source code.
  • The framework aims to improve model validation and verification by enabling consistent, public comparisons across models—especially for hazards that matter to the general public.
  • EWB is positioned as an evolving, free, open-source system that will add new phenomena, test cases, and metrics in collaboration with the global weather and forecast verification community.

Abstract

Forecasting the wide variety of high-impact weather events experienced globally is a challenge for both Artificial Intelligence (AI) and Numerical Weather Prediction (NWP) models and it is critical that such models be properly verified before deployment. Although AI weather models are rapidly evolving, much of their evaluation is currently done either with a global-scale evaluation or by hand-picking a small number of case studies or a region. A widely-used open-source benchmark suite focusing on high-impact weather will help to drive the science forward for all scales of weather models, as it has for other AI fields. Here we introduce Extreme Weather Bench (EWB), a new community-driven benchmark suite that facilitates model validation and verification on a variety of high-impact hazards that matter to people around the globe. EWB provides a standard set of case studies (spanning across multiple spatial and temporal scales and different parts of the weather spectrum), observational data, impact-based metrics, and open-source code for users to evaluate their models. Verifying that a model works against a standard set of case studies, especially events that are high-impact for the general public, is a key piece of improving the trustworthiness of AI models. EWB will help to drive the science forward for all weather models, enabling true comparisons across models and evaluating models on specific high-impact phenomena through the use of case studies. EWB is a free open-source community-driven system and will continue to evolve to include additional phenomena, test cases and metrics in collaboration with the worldwide weather and forecast verification community.