Diverse Image Priors for Black-box Data-free Knowledge Distillation

arXiv cs.LG / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies black-box data-free knowledge distillation, where the student can only access the teacher’s top-1 predictions and has no access to training data or the teacher interface.
  • It introduces DIP-KD (Diverse Image Priors Knowledge Distillation), which improves synthetic-data-based distillation by using a three-phase collaborative pipeline: diverse image prior synthesis, contrastive learning for stronger distinction, and soft-probability distillation via a primer student.
  • The method specifically targets limitations of prior approaches related to insufficient diversity and weak distillation signals from synthetic samples.
  • Experiments on 12 benchmarks show DIP-KD achieves state-of-the-art results, and ablation studies indicate that data diversity is a key factor for effective knowledge acquisition under restrictive conditions.
  • The contribution is positioned as practical for privacy-preserving or decentralized AI ecosystems where data and model access are constrained.

Abstract

Knowledge distillation (KD) represents a vital mechanism to transfer expertise from complex teacher networks to efficient student models. However, in decentralized or secure AI ecosystems, privacy regulations and proprietary interests often restrict access to the teacher's interface and original datasets. These constraints define a challenging black-box data-free KD scenario where only top-1 predictions and no training data are available. While recent approaches utilize synthetic data, they still face limitations in data diversity and distillation signals. We propose Diverse Image Priors Knowledge Distillation (DIP-KD), a framework that addresses these challenges through a three-phase collaborative pipeline: (1) Synthesis of image priors to capture diverse visual patterns and semantics; (2) Contrast to enhance the collective distinction between synthetic samples via contrastive learning; and (3) Distillation via a novel primer student that enables soft-probability KD. Our evaluation across 12 benchmarks shows that DIP-KD achieves state-of-the-art performance, with ablations confirming data diversity as critical for knowledge acquisition in restricted AI environments.