Conformal PM2.5 Mapping Under Spatial Covariate Shift: Satellite-Reanalysis Fusion for Africa's Green Industrial Transition

arXiv cs.LG / 4/28/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsIndustry & Market MovesModels & Research

Key Points

  • The study proposes a satellite–reanalysis PM2.5 fusion system for Africa’s air-quality monitoring, trained on over 2.06 million records from 404 ground locations across 29 countries.
  • It uses LightGBM with leakage-resistant spatial cross-validation and conformal prediction to both improve robustness and quantify where predictions are geographically applicable.
  • In 5-fold location-grouped spatial cross-validation, the model reports RMSE 30.83±5.07 µg/m3, MAE 14.54±1.66 µg/m3, and R2 0.134±0.023, with low R2 attributed to real geographic generalization challenges rather than model failure.
  • Split conformal prediction for 90% marginal coverage finds under-coverage in East Africa (PICP 65.3% vs. nominal 90%), consistent with medium-strength spatial covariate shift indicated by KS statistics on humidity and satellite PBLH.
  • The authors translate uncertainty into operational outputs via regional reliability flags and a monitor prioritization score to guide expansion toward high-burden, currently unmonitored populations, supporting SDG-aligned green industrial transition goals.

Abstract

Africa's green industrialization imperative demands reliable infrastructure for monitoring air quality. We present a satellite-reanalysis PM2.5 fusion system trained on 2,068,901 records from 404 monitoring locations in 29 African countries (OpenAQ, 2017-2022), combining LightGBM with leakage-resistant spatial cross-validation and conformal prediction to quantify predictions and their geographic applicability limits. Under 5-fold location-grouped spatial cross-validation, LightGBM achieves RMSE = 30.83 +/- 5.07 ug/m3, MAE = 14.54 +/- 1.66 ug/m3, R2 = 0.134 +/- 0.023, and macro F1 = 0.336 +/- 0.018. This R2 is substantially below random-split benchmarks (>0.90) but reflects true geographic generalisation difficulty rather than model failure. Split conformal prediction targeting 90% marginal coverage reveals severe East Africa degradation (actual PICP = 65.3% vs. nominal 90%), consistent with medium-strength covariate shift (humidity KS = 0.2237, sat_pblh KS = 0.2558). We operationalise these findings through regional reliability flags (High/Medium/Low/Unreliable) and a monitor prioritisation score directing infrastructure expansion toward highest-burden unmonitored populations, directly supporting Africa's green industrial transition and SDGs 3.9, 7.1.2, 9, 11.6.2, and 13.