Benchmarking Scientific Machine Learning Models for Air Quality Data
arXiv cs.LG / 2026/3/24
💬 オピニオンIdeas & Deep AnalysisModels & Research
要点
- The study benchmarks multiple classical, machine-learning, and deep-learning approaches for multi-horizon AQI forecasting (PM2.5 and O3) in North Texas using EPA daily observations from 2022–2024.
- It builds standardized, city-level lag-wise forecasting datasets with forecasting horizons using LAG in {1, 7, 14, 30} days and evaluates models with chronological train-test splits.
- Deep-learning models (MLP and LSTM) outperform simpler baselines (linear regression and SARIMAX) across evaluated error metrics such as MAE and RMSE.
- Physics-guided variants (MLP+Physics, LSTM+Physics) incorporate EPA breakpoint-based AQI formulation as a weighted-loss consistency constraint, improving stability and producing physically consistent pollutant–AQI relationships.
- The largest gains from physics guidance appear for short-horizon predictions and for specific pollutants (notably PM2.5 and O3), yielding a region-specific guideline for model selection.