Trust-SSL: Additive-Residual Selective Invariance for Robust Aerial Self-Supervised Learning

arXiv cs.CV / 4/24/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The paper introduces Trust-SSL, an aerial-image self-supervised learning method designed to stay robust when augmentations severely degrade semantic content (e.g., haze, blur, rain, and occlusion).
  • Trust-SSL adds a per-sample, per-factor trust weight to the alignment objective and uses an additive-residual formulation with a stop-gradient on the trust weight to avoid harming the backbone.
  • Experiments using a 200-epoch protocol on a 210,000-image corpus show the highest mean linear-probe accuracy across multiple backbones on EuroSAT, AID, and NWPU-RESISC45, outperforming SimCLR and VICReg.
  • The method delivers especially large gains under strong information-erasing corruptions and improves zero-shot cross-domain robustness on weather stress tests, with ablations confirming the additive-residual design as the main driver.
  • An evidential variant based on Dempster-Shafer fusion provides interpretable signals (conflict and ignorance), positioning the work as a concrete uncertainty-aware SSL design principle, with code released publicly.

Abstract

Self-supervised learning (SSL) is a standard approach for representation learning in aerial imagery. Existing methods enforce invariance between augmented views, which works well when augmentations preserve semantic content. However, aerial images are frequently degraded by haze, motion blur, rain, and occlusion that remove critical evidence. Enforcing alignment between a clean and a severely degraded view can introduce spurious structure into the latent space. This study proposes a training strategy and architectural modification to enhance SSL robustness to such corruptions. It introduces a per-sample, per-factor trust weight into the alignment objective, combined with the base contrastive loss as an additive residual. A stop-gradient is applied to the trust weight instead of a multiplicative gate. While a multiplicative gate is a natural choice, experiments show it impairs the backbone, whereas our additive-residual approach improves it. Using a 200-epoch protocol on a 210,000-image corpus, the method achieves the highest mean linear-probe accuracy among six backbones on EuroSAT, AID, and NWPU-RESISC45 (90.20% compared to 88.46% for SimCLR and 89.82% for VICReg). It yields the largest improvements under severe information-erasing corruptions on EuroSAT (+19.9 points on haze at s=5 over SimCLR). The method also demonstrates consistent gains of +1 to +3 points in Mahalanobis AUROC on a zero-shot cross-domain stress test using BDD100K weather splits. Two ablations (scalar uncertainty and cosine gate) indicate the additive-residual formulation is the primary source of these improvements. An evidential variant using Dempster-Shafer fusion introduces interpretable signals of conflict and ignorance. These findings offer a concrete design principle for uncertainty-aware SSL. Code is publicly available at https://github.com/WadiiBoulila/trust-ssl.