Towards Scaling Law Analysis For Spatiotemporal Weather Data

arXiv cs.LG / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper extends neural compute-optimal scaling law analysis from NLP/CV-style single-step objectives to autoregressive spatiotemporal weather forecasting with long-horizon rollouts.
  • It introduces evaluation that tracks how prediction error is distributed across disparate physical channels and how error growth rates change as the forecast horizon increases.
  • The authors test whether power-law scaling holds for test error when errors are pooled globally across channels versus when scaling is examined per-channel and relative to rollout length.
  • Results show strong heterogeneity: global pooled scaling may appear favorable while many individual channels degrade at late lead times.
  • The study outlines practical implications for using weighted objectives, horizon-aware training curricula, and more informed resource allocation across outputs during model development.

Abstract

Compute-optimal scaling laws are relatively well studied for NLP and CV, where objectives are typically single-step and targets are comparatively homogeneous. Weather forecasting is harder to characterize in the same framework: autoregressive rollouts compound errors over long horizons, outputs couple many physical channels with disparate scales and predictability, and globally pooled test metrics can disagree sharply with per-channel, late-lead behavior implied by short-horizon training. We extend neural scaling analysis for autoregressive weather forecasting from single-step training loss to long rollouts and per-channel metrics. We quantify (1) how prediction error is distributed across channels and how its growth rate evolves with forecast horizon, (2) if power law scaling holds for test error, relative to rollout length when error is pooled globally, and (3) how that fit varies jointly with horizon and channel for parameter, data, and compute-based scaling axes. We find strong cross-channel and cross-horizon heterogeneity: pooled scaling can look favorable while many channels degrade at late leads. We discuss implications for weighted objectives, horizon-aware curricula, and resource allocation across outputs.