Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions

Key Points

The paper addresses the high cost of online reinforcement learning for Android agents, highlighting inefficiencies of the prevailing Single State Single Action training paradigm under emulator latency and limited exploration.

Abstract

Online reinforcement learning (RL) serves as an effective method for enhancing the capabilities of Android agents. However, guiding agents to learn through online interaction is prohibitively expensive due to the high latency of emulators and the sample inefficiency of existing RL algorithms. We identify a fundamental limitation in current approaches: the Single State Single Action paradigm, which updates the policy with one-to-one state-action pairs from online one-way rollouts without fully exploring each costly emulator state. In this paper, we propose Android Coach, a novel framework that shifts the training paradigm to Single State Multiple Actions, allowing the agent to sample and utilize multiple actions for a single online state. We enable this without additional emulator overhead by learning a critic that estimates action values. To ensure the critic serves as a reliable coach, we integrate a process reward model and introduce a group-wise advantage estimator based on the averaged critic outputs. Extensive experiments demonstrate the effectiveness and efficiency of Android Coach: it achieves 7.5% and 8.3% success rate improvements on AndroidLab and AndroidWorld over UI-TARS-1.5-7B, and attains 1.4x higher training efficiency than Single State Single Action methods PPO and GRPO at matched success rates.

Android Coach: Improve Online Agentic Training Efficiency with Single State Multiple Actions

Key Points

Abstract

Related Articles

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Why Most Productivity Systems Fail (And What to Do Instead)

Moving from proof of concept to production: what we learned with Nometria

Frontend Engineers Are Becoming AI Trainers

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Related Articles

Why Anthropic’s new model has cybersecurity experts rattled
Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?
Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)
Dev.to

Moving from proof of concept to production: what we learned with Nometria
Dev.to

Frontend Engineers Are Becoming AI Trainers
Dev.to