Faithful Mobile GUI Agents with Guided Advantage Estimator
arXiv cs.AI / 5/5/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that vision-language GUI agents can act unfaithfully by using memorized shortcuts instead of grounding actions in visible screen evidence or user instructions.
- It introduces Faithful-Agent, a faithfulness-first framework that reshapes GUI interaction to emphasize evidence-groundedness and internal consistency.
- Faithful-Agent uses a two-stage training pipeline: a faithfulness-oriented SFT stage to encourage abstention when evidence is perturbed, followed by an RFT stage to further boost faithfulness.
- The RFT stage adds a guided advantage estimator (GuAE) based on GRPO, designed to prevent advantage collapse in low-variance rollout groups when GUI rewards are sparse.
- With an additional thought-action consistency reward, the Stage-II method raises the Trap SR from 13.88% to 80.21% versus the baseline while maintaining strong performance on general instruction following.
Related Articles

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents
Dev.to

The Cash Is Already Earned: Why Construction Pay Application Exceptions Fit an Agent Better Than SaaS
Dev.to

Why Ship-and-Debit Claim Recovery Is a Better Agent Wedge Than Another “AI Back Office” Tool
Dev.to
AI is getting better at doing things, but still bad at deciding what to do?
Reddit r/artificial

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny
Dev.to