Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding
arXiv cs.AI / 5/4/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- The paper introduces GUI-SD, a new on-policy self-distillation (OPSD) framework specifically designed for GUI grounding, mapping natural language instructions to target visual coordinates.
- GUI-SD improves teacher guidance by building a visually enriched privileged context using a target bounding box and a Gaussian soft mask, avoiding leakage of exact coordinates while still providing dense supervision.
- It uses entropy-guided distillation that weights tokens based on digit significance and teacher confidence, focusing training on the most reliable and impactful visual positions.
- Experiments across six GUI grounding benchmarks show GUI-SD outperforms GRPO-based methods and naive OPSD in both accuracy and training efficiency.
- The authors provide code and training data publicly, enabling replication and further development of OPSD for GUI-grounding agents.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.



