Unlocking Optical Prior: Spectrum-Guided Knowledge Transfer for SAR Generalized Category Discovery

arXiv cs.CV / 4/27/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles the difficulty of applying Generalized Category Discovery (GCD) to label-scarce SAR data by addressing cross-modal incompatibility between optical foundation models’ priors and SAR imagery.
  • It introduces the Modal Discrepancy Curve (MDC), modeling cross-modal mismatch as a structured frequency-domain descriptor based on spectral energy distributions.
  • Using MDC, the authors propose MCPT, a paired optical–SAR pre-training framework that turns MDC into learnable tokens via Adaptive Frequency Tokenization (AFT) and refines features with Frequency-aware Expert Refinement (FER) in a band-wise, discrepancy-aware way.
  • The approach uses contrastive learning to align refined embeddings across optical and SAR modalities, then transfers the learned SAR representations to downstream single-modal SAR-GCD tasks.
  • Experiments on multiple mainstream datasets show state-of-the-art results, suggesting that frequency-domain discrepancy modeling can more effectively transfer optical prior into SAR.

Abstract

Generalized Category Discovery (GCD) holds significant promise for the label-scarce Synthetic Aperture Radar (SAR) domain, yet its efficacy is severely constrained by the cross-modal incompatibility between the inherent optical prior of the Large Vision Models (LVMs) and SAR imagery. Existing domain adaptation methods often lack an inductive bias that reflects imaging characteristics, consequently failing to effectively transfer optical prior into the SAR domain. To address this issue, the Modal Discrepancy Curve (MDC) is introduced to model cross-modal discrepancy as a structured frequency-domain descriptor derived from spectral energy distributions. Leveraging this formulation, we propose the MDC-guided Cross-modal Prior Transfer (MCPT) framework, a pre-training paradigm that operates on paired optical-SAR data. Within this framework, Adaptive Frequency Tokenization (AFT) converts the MDC into learnable tokens, and Frequency-aware Expert Refinement (FER) performs band-wise discrepancy-aware feature refinement using these tokens. Based on the refined representations, contrastive learning aligns refined embeddings across modalities and internalizes the adaptation pattern. Ultimately, the superior SAR feature representation capability learned during paired pre-training is applied to downstream single-modal SAR-GCD tasks. Extensive experiments demonstrate state-of-the-art performance across multiple mainstream datasets, indicating that frequency-domain discrepancy modeling enables more effective adaptation of optical prior to SAR imagery.