AI Navigate

GeoChemAD: Benchmarking Unsupervised Geochemical Anomaly Detection for Mineral Exploration

arXiv cs.LG / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • GeoChemAD provides an open-source benchmark dataset derived from government geological surveys, covering eight subsets across diverse regions, sampling sources, and target elements to support reproducible mineral-exploration research.
  • The work reproduces and benchmarks a range of unsupervised anomaly detection methods, including statistical models, generative approaches, and transformer-based techniques, establishing strong baselines for comparison.
  • They introduce GeoChemFormer, a transformer-based framework that uses self-supervised pretraining to learn target-element-aware geochemical representations for spatial samples.
  • Extensive experiments show GeoChemFormer achieves superior performance and robustness across all eight subsets, improving anomaly-detection accuracy and generalization.
  • The dataset and framework lay the groundwork for reproducible research and future development in geochemical anomaly detection.

Abstract

Geochemical anomaly detection plays a critical role in mineral exploration as deviations from regional geochemical baselines may indicate mineralization. Existing studies suffer from two key limitations: (1) single region scenarios which limit model generalizability; (2) proprietary datasets, which makes result reproduction unattainable. In this work, we introduce \textbf{GeoChemAD}, an open-source benchmark dataset compiled from government-led geological surveys, covering multiple regions, sampling sources, and target elements. The dataset comprises eight subsets representing diverse spatial scales and sampling conditions. To establish strong baselines, we reproduce and benchmark a range of unsupervised anomaly detection methods, including statistical models, generative and transformer-based approaches. Furthermore, we propose \textbf{GeoChemFormer}, a transformer-based framework that leverages self-supervised pretraining to learn target-element-aware geochemical representations for spatial samples. Extensive experiments demonstrate that GeoChemFormer consistently achieves superior and robust performance across all eight subsets, outperforming existing unsupervised methods in both anomaly detection accuracy and generalization capability. The proposed dataset and framework provide a foundation for reproducible research and future development in this direction.