Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding
arXiv cs.LG / 4/24/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses GUI grounding—mapping natural-language instructions to exact pixel coordinates—where current models often miss precise localization despite understanding semantic intent.
- Instead of relying on static self-consistency methods (e.g., geometric clustering or higher Pass@k sampling), it introduces a learnable selection mechanism that picks the best target by having a visual critic critique the model’s rendered proposals.
- It proposes a co-evolving “Propose-then-Critic” framework, combining proposer and critic training in a mutually reinforcing loop to handle the mismatch between their capabilities.
- The training method uses maturity-aware adaptive co-evolutionary reinforcement learning to dynamically balance proposer/critic objectives, improving both spatial exploration and critic discrimination.
- Experiments across six benchmarks show significant gains in grounding accuracy and in the critic’s reliability.
Related Articles

The 67th Attempt: When Your "Knowledge Management" System Becomes a Self-Fulfilling Prophecy of Excellence
Dev.to

Context Engineering for Developers: A Practical Guide (2026)
Dev.to

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.
Dev.to

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)
Dev.to
Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF
Reddit r/LocalLLaMA