Training Computer Use Agents to Assess the Usability of Graphical User Interfaces

arXiv cs.AI / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper highlights that traditional GUI usability testing with experts and users is costly and time-consuming, motivating automated approaches using computer use agents and generative agents.
  • It argues that existing agents still fail to deliver sufficiently accurate usability assessments, even when they can simulate interactions and preferences.
  • The authors propose a new machine learning method to operationalize a computational definition of usability, training CUAs to (1) focus on key interaction flows, (2) execute them via human-like interactions, and (3) predict a numeric usability score.
  • They introduce uxCUA, trained on a large dataset of fully interactive UIs with usability labels and human preference data, and report improved performance over larger models for usability scoring and critique quality.
  • The work positions itself as a principled, data-driven foundation for automated usability assessment in HCI (human-computer interaction).

Abstract

Usability testing with experts and potential users can assess the effectiveness, efficiency, and user satisfaction of graphical user interfaces (GUIs) but doing so remains a costly and time-intensive process. Prior work has used computer use agents (CUAs) and other generative agents that can simulate user interactions and preference, but we show that agents still struggle to provide accurate usability assessments. In this work, we present a novel machine learning method that operationalizes a computational definition of usability to train CUAs to assess GUI usability by i) prioritizing important interaction flows, ii) executing them through human-like interactions, and iii) predicting a learned numerical usability score. We train a computer use agent, uxCUA, with our algorithm on a large-scale dataset of fully interactive user interfaces (UIs) paired with usability labels and human preferences. We show that uxCUA outperforms larger models in accurate usability assessments and produces realistic critiques of both synthetic and real UIs. More broadly, our work aims to build a principled, data-driven foundation for automated usability assessment in HCI.