Rolling-Origin Validation Reverses Model Rankings in Multi-Step PM10 Forecasting: XGBoost, SARIMA, and Persistence
arXiv cs.LG / 3/24/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The study argues that many air-quality forecasting papers overstate ML gains by using static train/test splits and excluding persistence baselines, which obscures operational value under routine re-training.
- Using 2,350 daily PM10 observations (2017–2024) from a southern Europe urban background station, the authors compare XGBoost and SARIMA against persistence under both a static split and a rolling-origin protocol with monthly updates.
- Under static evaluation, XGBoost looks strongest for 1–7 day horizons, but rolling-origin testing reverses the rankings, showing XGBoost is not consistently better than persistence at short to intermediate lead times.
- SARIMA maintains positive persistence-relative skill across the full forecast horizon range, indicating more stable performance when models are updated regularly.
- The paper recommends using rolling-origin, persistence-referenced skill profiles and a “predictability horizon” metric to identify which methods remain reliable at each lead time in operational settings.
Related Articles

Composer 2: What is new and Compares with Claude Opus 4.6 & GPT-5.4
Dev.to
How UCP Breaks Your E-Commerce Tracking Stack: A Platform-by-Platform Analysis
Dev.to
AI Text Analyzer vs Asking Friends: Which Gives Better Perspective?
Dev.to
[D] Cathie wood claims ai productivity wave is starting, data shows 43% of ceos save 8+ hours weekly
Reddit r/MachineLearning

Microsoft hires top AI researchers from Allen Institute for AI for Suleyman's Superintelligence team
THE DECODER