VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents

arXiv cs.CL / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes VeriOS, a query-driven framework for human–agent–GUI interaction that helps OS agents decide when to ask for human input to avoid over-execution in untrustworthy real-world settings.
It introduces VeriOS-Agent, trained with a three-stage learning approach designed to decouple and leverage meta-knowledge via supervised fine-tuning and group relative policy optimization.
VeriOS-Agent is intended to autonomously execute tasks under normal (trustworthy) conditions while proactively querying humans when conditions appear unreliable.
Experiments report a 19.72% improvement in average step-wise success rate over strong baselines without degrading normal-condition performance.
The authors provide code, datasets, and models publicly, and claim improved rationality, generalizability, and scalability based on their analyses.

Abstract

With the rapid progress of multimodal large language models, operating system (OS) agents become increasingly capable of automating tasks through on-device graphical user interfaces (GUIs). However, most existing OS agents are designed for idealized settings, whereas real-world environments often present untrustworthy conditions. To mitigate risks of over-execution in such scenarios, we propose a query-driven human-agent-GUI interaction framework that enables OS agents to decide when to query humans for more reliable task completion. Built upon this framework, we introduce VeriOS-Agent, a trustworthy OS agent trained with a three-stage learning paradigm that falicitate the decoupling and utilization of meta-knowledge by supervised fine-tuning and group relative policy optimization. Concretely, VeriOS-Agent autonomously executes actions in normal conditions while proactively querying humans in untrustworthy scenarios. Experiments show that VeriOS-Agent improves the average step-wise success rate by 19.72\% in over the strongest baselines, without compromising normal performance. VeriOS-Agent significantly improves performance in untrustworthy scenarios while maintaining comparable performance in trustworthy scenarios. Analysis highlights VeriOS-Agent's rationality, generalizability, and scalability. The codes, datasets and models are available at https://github.com/Wuzheng02/VeriOS.