Dual-Stage Invariant Continual Learning under Extreme Visual Sparsity

arXiv cs.LG / 3/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies continual learning for object detection under extreme visual sparsity (e.g., space-based RSO detection) where foreground signals are overwhelmed by background, leading to backbone destabilization under sequential domain shifts.
  • It provides an analytic explanation that background-driven gradients cause progressive representation drift, revealing a structural weakness in continual learning methods that only use output-level distillation.
  • To counter this, the authors propose a dual-stage invariant continual learning framework that applies joint distillation to both intermediate backbone representations (structural consistency) and detection predictions (semantic consistency).
  • They further introduce sparsity-aware data conditioning—patch-based sampling plus distribution-aware augmentation—to regulate gradient statistics under severe class/visual imbalance.
  • Experiments on a high-resolution space-based RSO detection dataset show an absolute +4.0 mAP improvement over established continual object detection methods under sequential domain shifts.

Abstract

Continual learning seeks to maintain stable adaptation under non-stationary environments, yet this problem becomes particularly challenging in object detection, where most existing methods implicitly assume relatively balanced visual conditions. In extreme-sparsity regimes, such as those observed in space-based resident space object (RSO) detection scenarios, foreground signals are overwhelmingly dominated by background observations. Under such conditions, we analytically demonstrate that background-driven gradients destabilize the feature backbone during sequential domain shifts, causing progressive representation drift. This exposes a structural limitation of continual learning approaches relying solely on output-level distillation, as they fail to preserve intermediate representation stability. To address this, we propose a dual-stage invariant continual learning framework via joint distillation, enforcing structural and semantic consistency on both backbone representations and detection predictions, respectively, thereby suppressing error propagation at its source while maintaining adaptability. Furthermore, to regulate gradient statistics under severe imbalance, we introduce a sparsity-aware data conditioning strategy combining patch-based sampling and distribution-aware augmentation. Experiments on a high-resolution space-based RSO detection dataset show consistent improvement over established continual object detection methods, achieving an absolute gain of +4.0 mAP under sequential domain shifts.