A11y-Compressor: A Framework for Enhancing the Efficiency of GUI Agent Observations through Visual Context Reconstruction and Redundancy Reduction

arXiv cs.AI / 5/4/2026

💬 OpinionModels & Research

共有:

Key Points

The paper introduces A11y-Compressor, a framework that converts linearized GUI accessibility trees into more compact, structured observation representations for GUI agents.
It addresses key limitations of the accessibility tree format—namely redundancy and missing structural/spatial relationship information—via a transformation pipeline.
The proposed implementation, Compressed-a11y, uses lightweight steps including modal detection, redundancy reduction, and semantic structuring to rebuild useful context.
Experiments on the OSWorld benchmark show token usage is cut to 22% of the original while improving average task success rate by 5.1 percentage points.

Abstract

AI agents that interact with graphical user interfaces (GUIs) require effective observation representations for reliable grounding. The accessibility tree is a commonly used text-based format that encodes UI element attributes, but it suffers from redundancy and lacks structural information such as spatial relationships among elements. We propose A11y-Compressor, a framework that transforms linearized accessibility trees into compact and structured representations. Our implementation, Compressed-a11y, applies a lightweight and structured transformation pipeline with modal detection, redundancy reduction, and semantic structuring. Experiments on the OSWorld benchmark show that Compressed-a11y reduces input tokens to 22% of the original while improving task success rates by 5.1 percentage points on average.

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

The Verge

CLMA Frame Test

Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

Dev.to

Roundtable chat with Talkie-1930 and Gemma 4 31B

Reddit r/LocalLLaMA

A11y-Compressor: A Framework for Enhancing the Efficiency of GUI Agent Observations through Visual Context Reconstruction and Redundancy Reduction

Key Points

Abstract

Related Articles

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

CLMA Frame Test

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

Roundtable chat with Talkie-1930 and Gemma 4 31B

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer