AdaZoom-GUI: Adaptive Zoom-based GUI Grounding with Instruction Refinement
arXiv cs.CV / 3/19/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- AdaZoom-GUI introduces an adaptive zoom-based GUI grounding framework with an instruction refinement module that rewrites natural language commands into explicit descriptions to improve localization accuracy.
- It uses a conditional second-stage zoom-in strategy to better localize small GUI elements while avoiding unnecessary computation and context loss on simpler cases.
- The approach is supported by a high-quality GUI grounding dataset and trained with Group Relative Policy Optimization (GRPO) to predict both click coordinates and element bounding boxes.
- Experiments show state-of-the-art performance among models with comparable or larger parameter counts, highlighting its effectiveness for high-resolution GUI understanding and practical GUI agent deployment.
- The work has potential downstream impact on automated GUI interaction workflows across high-resolution interfaces and related applications.
Related Articles

Manus、AIエージェントをデスクトップ化 ローカルPC上でファイルやアプリを直接操作可能にのサムネイル画像
Ledge.ai
The programming passion is melting
Dev.to
Best AI Tools for Property Managers in 2026
Dev.to
Building “The Sentinel” – AI Parametric Insurance at Guidewire DEVTrails
Dev.to
Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations
Dev.to