IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents

arXiv cs.AI / 4/8/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

提案論文「IntentScore」は、Computer-Use Agentsが行う候補アクションの質を評価せずに実行してしまい、取り返しのつかない誤りが連鎖する問題に対処するプラン認識型の報酬モデルを提示しています。
IntentScoreは、3つのOSにまたがる398K件のオフラインGUI操作ステップから学習し、(1)状態-行動の関連性を高めるコントラスティブ整合と、(2)行動の正しさを順位付けするマージンランキングの2つの目的で訓練します。
アーキテクチャとして、候補アクションに含まれる「計画上の意図」をアクションエンコーダに埋め込み、類似した操作でも異なる合理（意図）に基づく候補を識別できるようにしています。
Held-out評価で97.5%のペアワイズ判別精度を達成し、学習で未遭遇のOSWorld環境でAgent S3のリランカーとして用いるとタスク成功率が6.9ポイント向上したことが示されています。

Abstract

Computer-Use Agents (CUAs) leverage large language models to execute GUI operations on desktop environments, yet they generate actions without evaluating action quality, leading to irreversible errors that cascade through subsequent steps. We propose IntentScore, a plan-aware reward model that learns to score candidate actions from 398K offline GUI interaction steps spanning three operating systems. IntentScore trains with two complementary objectives: contrastive alignment for state-action relevance and margin ranking for action correctness. Architecturally, it embeds each candidate's planning intent in the action encoder, enabling discrimination between candidates with similar actions but different rationales. IntentScore achieves 97.5% pairwise discrimination accuracy on held-out evaluation. Deployed as a re-ranker for Agent S3 on OSWorld, an environment entirely unseen during training, IntentScore improves task success rate by 6.9 points, demonstrating that reward estimation learned from heterogeneous offline trajectories generalizes to unseen agents and task distributions.

Black Hat Asia

AI Business

[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project

Reddit r/MachineLearning

ALTK‑Evolve: On‑the‑Job Learning for AI Agents

Hugging Face Blog

Context Windows Are Getting Absurd — And That's a Good Thing

Dev.to

Every AI Agent Registry in 2026, Compared

Dev.to

IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents

Key Points

Abstract

Related Articles

Black Hat Asia

[N] Just found out that Milla Jovovich is a dev, invested in AI, and just open sourced a project

ALTK‑Evolve: On‑the‑Job Learning for AI Agents

Context Windows Are Getting Absurd — And That's a Good Thing

Every AI Agent Registry in 2026, Compared

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer