Rethinking Dataset Distillation: Hard Truths about Soft Labels
arXiv cs.LG / 4/22/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- New evidence suggests that, in soft-label downstream training, simple random subsets can match state-of-the-art dataset distillation (DD) methods, undermining assumptions that DD quality improvements always matter.
- A scalability analysis across soft-label (SL), fixed soft-label (SL), and hard-label (HL) regimes finds that high-quality coresets do not clearly beat random baselines in SL and SL+KD, and performance saturates near full-dataset levels in the SL+KD setting for a fixed compute budget.
- The results challenge common evaluation practices that rely on soft labels, because—unlike hard-label settings—subset quality has negligible impact on evaluation outcomes under soft-label training.
- In the HL setting, only the RDED DD method consistently beats random baselines on ImageNet-1K, though it can still trail strong coreset approaches due to over-reliance on easy sample patches.
- The paper proposes CAD-Prune and a compute-aligned DD method CA2D, using compute-aware pruning to select optimally difficult samples and outperform existing DD methods on ImageNet-1K under various IPC settings.


