VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation
arXiv cs.CL / 4/24/2026
📰 NewsModels & Research
Key Points
- VLAA-GUI is a modular framework for autonomous GUI agents that addresses early stopping and repetitive failure loops by introducing three integrated components: Stop, Recover, and Search.
- A mandatory Completeness Verifier enforces UI-observable success criteria at every completion step, using an agent-level cross-check to reject success claims without direct visual evidence.
- A mandatory Loop Breaker reduces repeated failures via multi-tier filtering, including switching interaction modes, changing strategies when screen states recur, and linking reflection signals to strategy shifts.
- An on-demand Search Agent queries an LLM with search capabilities for unfamiliar workflows, while on-demand Coding and Grounding Agents support code-intensive actions and precise action grounding as needed.
- Across five strong GUI-automation backbones on Linux and Windows benchmarks, VLAA-GUI reaches 77.5% on OSWorld and 61.0% on WindowsAgentArena, with ablations showing consistent gains and analysis indicating the Loop Breaker cuts wasted steps by nearly half for loop-prone models.


