MEDiC: Multi-objective Exploration of Distillation from CLIP
arXiv cs.CV / 4/1/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- MEDiC is a new distillation framework that unifies masked image modeling in both pixel space and latent feature space by combining patch-level token distillation from a frozen CLIP encoder, global CLS alignment, and pixel reconstruction with a lightweight decoder.
- Experiments show that the three objectives are complementary, with the full combination achieving 73.9% kNN accuracy on ImageNet-1K using ViT-Base.
- The paper investigates evolved masking strategies using hierarchical clustering and relative position bias, but finds that evolved masking does not improve teacher-guided distillation over simpler block masking, likely due to the teacher’s built-in semantic awareness.
- It reports high sensitivity to loss weighting, where small perturbations to scalar loss weights can reduce kNN accuracy by as much as 17 percentage points.
- The authors report overall performance of 73.9% kNN accuracy and 85.1% fine-tuning accuracy after 300 epochs with ViT-Base, alongside a systematic study of the design space.
Related Articles

Black Hat Asia
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Day 6: I Stopped Writing Articles and Started Hunting Bounties
Dev.to

Early Detection of Breast Cancer using SVM Classifier Technique
Dev.to

I Started Writing for Others. It Changed How I Learn.
Dev.to