UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

arXiv cs.LG / 3/26/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces UI-Voyager, a two-stage autonomous mobile GUI agent designed to learn efficiently from failures in long-horizon, sparse-reward Android GUI tasks.
In the first stage, it uses Rejection Fine-Tuning (RFT) to continuously co-evolve data and models in an autonomous loop, reducing reliance on manual annotation.
In the second stage, it applies Group Relative Self-Distillation (GRSD) to locate critical fork points across group rollouts and generate dense step-level supervision from successful trajectories.
Experiments on AndroidWorld report that the 4B model reaches an 81.0% Pass@1 success rate, outperforming many recent baselines and surpassing human-level performance.
Ablation studies and case analyses support GRSD’s effectiveness in improving learning signal quality and credit assignment under ambiguous outcomes.

Abstract

Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI tasks. To that end, we propose UI-Voyager, a novel two-stage self-evolving mobile GUI agent. In the first stage, we employ Rejection Fine-Tuning (RFT), which enables the continuous co-evolution of data and models in a fully autonomous loop. The second stage introduces Group Relative Self-Distillation (GRSD), which identifies critical fork points in group rollouts and constructs dense step-level supervision from successful trajectories to correct failed ones. Extensive experiments on AndroidWorld show that our 4B model achieves an 81.0% Pass@1 success rate, outperforming numerous recent baselines and exceeding human-level performance. Ablation and case studies further verify the effectiveness of GRSD. Our method represents a significant leap toward efficient, self-evolving, and high-performance mobile GUI automation without expensive manual data annotation.

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Dev.to

Mercor competitor Deccan AI raises $25M, sources experts from India

Dev.to

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

Dev.to

How Should Students Document AI Usage in Academic Work?

Dev.to

I asked my AI agent to design a product launch image. Here's what came back.

Dev.to

UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

Key Points

Abstract

Related Articles

Regulating Prompt Markets: Securities Law, Intellectual Property, and the Trading of Prompt Assets

Mercor competitor Deccan AI raises $25M, sources experts from India

How We Got Local MCP Servers Working in Claude Cowork (The Missing Guide)

How Should Students Document AI Usage in Academic Work?

I asked my AI agent to design a product launch image. Here's what came back.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer