A temporal deep learning framework for calibration of low-cost air quality sensors

arXiv cs.LG / 4/24/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The study introduces a deep learning calibration framework for low-cost air quality sensors (LCS), targeting PM2.5, PM10, and NO2 measurements that suffer from drift, cross-sensitivity, and device-to-device variability.
  • It trains an LSTM model using co-located reference data from the OxAria network in Oxford, UK, explicitly modeling temporal dependencies and delayed environmental effects.
  • Compared with a Random Forest baseline that treats each observation independently, the sequence-based approach achieves improved R2 performance across training, validation, and test sets for all three pollutants.
  • The authors design a feature set with time-lagged parameters, harmonic encodings, and interaction terms to improve generalization to unseen time windows.
  • Validation against the Equivalence Spreadsheet Tool 3.1 shows regulatory compliance, with expanded uncertainties reported as 22.11% for NO2, 12.42% for PM10, and 9.1% for PM2.5.

Abstract

Low-cost air quality sensors (LCS) provide a practical alternative to expensive regulatory-grade instruments, making dense urban monitoring networks possible. Yet their adoption is limited by calibration challenges, including sensor drift, environmental cross-sensitivity, and variability in performance from device to device. This work presents a deep learning framework for calibrating LCS measurements of PM_{2.5}, PM_{10}, and NO_2 using a Long Short-Term Memory (LSTM) network, trained on co-located reference data from the OxAria network in Oxford, UK. Unlike the Random Forest (RF) baseline, which treats each observation independently, the proposed approach captures temporal dependencies and delayed environmental effects through sequence-based learning, achieving higher R^2 values across training, validation, and test sets for all three pollutants. A feature set is constructed combining time-lagged parameters, harmonic encodings, and interaction terms to improve generalization on unseen temporal windows. Validation of unseen calibrated values against the Equivalence Spreadsheet Tool 3.1 demonstrates regulatory compliance with expanded uncertainties of 22.11% for NO_2, 12.42% for PM_{10}, and 9.1% for PM_{2.5}.