SpecTM: Spectral Targeted Masking for Trustworthy Foundation Models

arXiv cs.AI / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Foundation models for Earth observation are often trained with stochastic masking that may not enforce physics constraints, limiting trustworthiness for predictive uses such as public-health guidance.
  • The paper introduces SpecTM (Spectral Targeted Masking), a physics-informed pretraining objective that encourages targeted-band reconstruction using cross-spectral context.
  • SpecTM uses an adaptable multi-task self-supervised framework (band reconstruction, bio-optical index inference, and 8-day-ahead temporal prediction) to learn spectrally intrinsic representations.
  • On NASA PACE hyperspectral imagery over Lake Erie for microcystin concentration regression, SpecTM reports improved predictive performance (R^2=0.695 current week; R^2=0.620 8-day-ahead) and gains over baselines, including better label efficiency under extreme scarcity.
  • Ablation results indicate targeted masking improves R^2 over random masking (+0.037), and the approach improves interpretability and cross-domain physics-informed representation learning.

Abstract

Foundation models are now increasingly being developed for Earth observation (EO), yet they often rely on stochastic masking that do not explicitly enforce physics constraints; a critical trustworthiness limitation, in particular for predictive models that guide public health decisions. In this work, we propose SpecTM (Spectral Targeted Masking), a physics-informed masking design that encourages the reconstruction of targeted bands from cross-spectral context during pretraining. To achieve this, we developed an adaptable multi-task (band reconstruction, bio-optical index inference, and 8-day-ahead temporal prediction) self-supervised learning (SSL) framework that encodes spectrally intrinsic representations via joint optimization, and evaluated it on a downstream microcystin concentration regression model using NASA PACE hyperspectral imagery over Lake Erie. SpecTM achieves R^2 = 0.695 (current week) and R^2 = 0.620 (8-day-ahead) predictions surpassing all baseline models by (+34% (0.51 Ridge) and +99% (SVR 0.31)) respectively. Our ablation experiments show targeted masking improves predictions by +0.037 R^2 over random masking. Furthermore, it outperforms strong baselines with 2.2x superior label efficiency under extreme scarcity. SpecTM enables physics-informed representation learning across EO domains and improves the interpretability of foundation models.