The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation
arXiv cs.LG / 4/21/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies a “Scaling Law of Miscalibration” in on-policy distillation, where models improve task accuracy but become systematically overconfident.
- The root cause is framed as an information mismatch: teachers provide supervision using privileged context available during training, while deployed models must produce confidence from deployment-time information.
- The authors formalize how teacher-conditioned success fails as a target for deployment-time confidence, and show that privileged context can lead to entropy collapse and optimism bias.
- To fix this, they introduce CaOPD (calibration-aware on-policy distillation), which estimates empirical confidence via model rollouts and uses student-grounded confidence targets for distillation.
- Experiments across models and domains indicate CaOPD achieves strong calibration (Pareto-optimal) while preserving competitive capability, with robust generalization to out-of-distribution and continual learning.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Every time a new model comes out, the old one is obsolete of course
Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims
Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM
Reddit r/LocalLLaMA