Benchmarking Scientific Machine Learning Models for Air Quality Data
arXiv cs.LG / 3/24/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The study benchmarks multiple classical, machine-learning, and deep-learning approaches for multi-horizon AQI forecasting (PM2.5 and O3) in North Texas using EPA daily observations from 2022–2024.
- It builds standardized, city-level lag-wise forecasting datasets with forecasting horizons using LAG in {1, 7, 14, 30} days and evaluates models with chronological train-test splits.
- Deep-learning models (MLP and LSTM) outperform simpler baselines (linear regression and SARIMAX) across evaluated error metrics such as MAE and RMSE.
- Physics-guided variants (MLP+Physics, LSTM+Physics) incorporate EPA breakpoint-based AQI formulation as a weighted-loss consistency constraint, improving stability and producing physically consistent pollutant–AQI relationships.
- The largest gains from physics guidance appear for short-horizon predictions and for specific pollutants (notably PM2.5 and O3), yielding a region-specific guideline for model selection.
Related Articles
MCP Is Quietly Replacing APIs — And Most Developers Haven't Noticed Yet
Dev.to
I Built a Self-Healing AI Trading Bot That Learns From Every Failure
Dev.to
Stop Guessing Your API Costs: Track LLM Tokens in Real Time
Dev.to

We are building PixelRooms! The marketplace of AI teams for thepixeloffice.ai
Dev.to
Every real estate agent tool worth your time in 2026, ranked and rated
Dev.to