Dual-Teacher Distillation with Subnetwork Rectification for Black-Box Domain Adaptation

arXiv cs.CV / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies black-box domain adaptation where the source data and source model are inaccessible, and transferable knowledge is obtained only via querying the black-box source model with target samples.
  • It proposes Dual-Teacher Distillation with Subnetwork Rectification (DDSR), which combines predictions from the black-box source model (specific knowledge) and a vision-language model (general semantic priors) to produce more reliable pseudo labels.
  • DDSR introduces subnetwork-driven regularization to reduce overfitting that can arise from noisy pseudo-label supervision, improving robustness during adaptation.
  • The method iteratively refines both target pseudo labels and the ViL prompts, then further optimizes the target model using self-training with classwise prototypes.
  • Experiments across multiple benchmarks show DDSR delivers consistent gains over prior state-of-the-art approaches, including those that assume access to source data or source models.

Abstract

Assuming that neither source data nor the source model is accessible, black box domain adaptation represents a highly practical yet extremely challenging setting, as transferable information is restricted to the predictions of the black box source model, which can only be queried using target samples. Existing approaches attempt to extract transferable knowledge through pseudo label refinement or by leveraging external vision language models (ViLs), but they often suffer from noisy supervision or insufficient utilization of the semantic priors provided by ViLs, which ultimately hinder adaptation performance. To overcome these limitations, we propose a dual teacher distillation with subnetwork rectification (DDSR) model that jointly exploits the specific knowledge embedded in black box source models and the general semantic information of a ViL. DDSR adaptively integrates their complementary predictions to generate reliable pseudo labels for the target domain and introduces a subnetwork driven regularization strategy to mitigate overfitting caused by noisy supervision. Furthermore, the refined target predictions iteratively enhance both the pseudo labels and ViL prompts, enabling more accurate and semantically consistent adaptation. Finally, the target model is further optimized through self training with classwise prototypes. Extensive experiments on multiple benchmark datasets validate the effectiveness of our approach, demonstrating consistent improvements over state of the art methods, including those using source data or models.