Training Computer Use Agents to Assess the Usability of Graphical User Interfaces

arXiv cs.AI / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper highlights that traditional GUI usability testing with experts and users is costly and time-consuming, motivating automated approaches using computer use agents and generative agents.
It argues that existing agents still fail to deliver sufficiently accurate usability assessments, even when they can simulate interactions and preferences.
The authors propose a new machine learning method to operationalize a computational definition of usability, training CUAs to (1) focus on key interaction flows, (2) execute them via human-like interactions, and (3) predict a numeric usability score.
They introduce uxCUA, trained on a large dataset of fully interactive UIs with usability labels and human preference data, and report improved performance over larger models for usability scoring and critique quality.
The work positions itself as a principled, data-driven foundation for automated usability assessment in HCI (human-computer interaction).

Abstract

Usability testing with experts and potential users can assess the effectiveness, efficiency, and user satisfaction of graphical user interfaces (GUIs) but doing so remains a costly and time-intensive process. Prior work has used computer use agents (CUAs) and other generative agents that can simulate user interactions and preference, but we show that agents still struggle to provide accurate usability assessments. In this work, we present a novel machine learning method that operationalizes a computational definition of usability to train CUAs to assess GUI usability by i) prioritizing important interaction flows, ii) executing them through human-like interactions, and iii) predicting a learned numerical usability score. We train a computer use agent, uxCUA, with our algorithm on a large-scale dataset of fully interactive user interfaces (UIs) paired with usability labels and human preferences. We show that uxCUA outperforms larger models in accurate usability assessments and produces realistic critiques of both synthetic and real UIs. More broadly, our work aims to build a principled, data-driven foundation for automated usability assessment in HCI.

Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]

Reddit r/MachineLearning

Agent Amnesia and the Case of Henry Molaison

Dev.to

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry

Dev.to

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance

Dev.to

Vibe coding is a tool, not a shortcut. Most people are using it wrong.

Dev.to

Training Computer Use Agents to Assess the Usability of Graphical User Interfaces

Key Points

Abstract

Related Articles

Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]

Agent Amnesia and the Case of Henry Molaison

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance

Vibe coding is a tool, not a shortcut. Most people are using it wrong.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer