Anomaly Detection in Soil Heavy Metal Contamination Using Unsupervised Learning for Environmental Risk Assessment

arXiv cs.LG / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • A new study uses unsupervised machine learning to detect unusual soil heavy-metal contamination patterns in Ghana’s Central Region across 12 waste sites and residential controls.
  • Isolation Forest and PCA reconstruction error each flagged 12 anomalous samples (15.4% of 78), while DBSCAN found no density-isolated noise points, highlighting differences across anomaly detectors.
  • By combining methods, the researchers extracted six robust anomalies (7.7%), all spatially concentrated at site S3, and showed they had 70–80% higher mean Hazard Index (HI) than normal samples.
  • The paper reports a strong positive relationship between PCA reconstruction error and HI (r≈0.8), and categorizes three anomaly types: extreme Cu enrichment at S3, unusually low Ni at S4/S5, and moderate multi-metal (Pb–Zn) co-elevation at S9–S12.
  • The authors argue the unsupervised approach offers more granular and objective site prioritization than aggregate health-risk indices alone for environmental management.

Abstract

Soil contamination by heavy metals poses a persistent environmental and public health concern in rapidly urbanising regions of Ghana, particularly at unregulated waste disposal sites. This study applies an unsupervised machine learning framework to detect and characterise anomalous heavy metal contamination patterns in soils from twelve waste sites and residential controls in the Central Region, of Ghana. Concentrations of eight metals (As, Cd, Cr, Cu, Hg, Ni, Pb, Zn) were analysed alongside standard health risk indices, including the Hazard Index (HI) and Incremental Lifetime Cancer Risk (ILCR). Isolation Forest and PCA reconstruction error each identified 12 anomalous samples (15.4\% of 78 samples), while DBSCAN detected no density-isolated noise points. A consensus approach isolated six robust anomalies (7.7\%), all spatially concentrated at a single site (S3). Anomalies exhibited approximately 70--80\% higher mean HI values than normal samples, with all consensus anomalies exceeding the HI=1 threshold. PCA reconstruction error showed a strong positive association with HI (r \approx 0.8), indicating consistency between multivariate deviation and health risk. Three distinct anomaly types were identified: extreme Cu enrichment at S3, anomalously low Ni at S4/S5, and moderate multi-metal (Pb--Zn) co-elevation at S9--S12. The results demonstrate that unsupervised machine learning provides granular, objective insight beyond aggregate indices, enabling targeted site prioritisation and risk-informed environmental management.