HEAL: Hindsight Entropy-Assisted Learning for Reasoning Distillation
arXiv cs.AI / 3/12/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- HEAL is an RL-free framework for distilling reasoning from large reasoning models into smaller models, addressing rejection sampling limitations and the teacher ceiling.
- It combines three modules: Guided Entropy-Assisted Repair (GEAR), Perplexity-Uncertainty Ratio Estimator (PURE), and Progressive Answer-guided Curriculum Evolution (PACE) to detect critical reasoning breakpoints, filter genuine breakthroughs, and guide curriculum progression.
- The framework draws on the Zone of Proximal Development to inject hindsight hints and repair broken reasoning trajectories during training.
- Extensive experiments on multiple benchmarks show that HEAL significantly outperforms traditional supervised fine-tuning distillation and other baselines.
- This work presents a new approach in model distillation and demonstrates notable improvements over standard methods.




