Collaborative Adaptive Curriculum for Progressive Knowledge Distillation

arXiv cs.LG / 2026/3/24

💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research

要点

  • The paper proposes Federated Adaptive Progressive Distillation (FAPD) to bridge the gap between complex, high-dimensional teacher knowledge and heterogeneous client learning capacities in edge/distributed visual analytics.
  • FAPD uses PCA-based hierarchical decomposition of teacher features to build a “visual knowledge hierarchy,” then sends clients progressively higher-complexity knowledge via dimension-adaptive projection matrices.
  • A consensus-driven server mechanism tracks network-wide learning stability using global accuracy fluctuations over a temporal window, increasing curriculum complexity only when collective consensus is achieved.
  • Experiments on three datasets show FAPD improves accuracy by 3.64% over FedAvg on CIFAR-10, achieves 2x faster convergence, and remains robust under extreme data heterogeneity (α=0.1), outperforming baselines by over 4.5%.

Abstract

Recent advances in collaborative knowledge distillation have demonstrated cutting-edge performance for resource-constrained distributed multimedia learning scenarios. However, achieving such competitiveness requires addressing a fundamental mismatch: high-dimensional teacher knowledge complexity versus heterogeneous client learning capacities, which currently prohibits deployment in edge-based visual analytics systems. Drawing inspiration from curriculum learning principles, we introduce Federated Adaptive Progressive Distillation (FAPD), a consensus-driven framework that orchestrates adaptive knowledge transfer. FAPD hierarchically decomposes teacher features via PCA-based structuring, extracting principal components ordered by variance contribution to establish a natural visual knowledge hierarchy. Clients progressively receive knowledge of increasing complexity through dimension-adaptive projection matrices. Meanwhile, the server monitors network-wide learning stability by tracking global accuracy fluctuations across a temporal consensus window, advancing curriculum dimensionality only when collective consensus emerges. Consequently, FAPD provably adapts knowledge transfer pace while achieving superior convergence over fixed-complexity approaches. Extensive experiments on three datasets validate FAPD's effectiveness: it attains 3.64% accuracy improvement over FedAvg on CIFAR-10, demonstrates 2x faster convergence, and maintains robust performance under extreme data heterogeneity ({\alpha}=0.1), outperforming baselines by over 4.5%.