A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

arXiv cs.CL / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper argues that long-horizon, terminal-centric agents often keep raw environment feedback in the dialogue history, creating heavy redundancy and causing token costs to grow roughly quadratically with the number of steps.
It proposes TACO, a plug-and-play Terminal Agent Compression framework that self-evolves by automatically discovering and refining observation compression rules from interaction trajectories.
Experiments on TerminalBench (TB 1.0 and TB 2.0) and four other terminal-related benchmarks show TACO improves performance across mainstream agent frameworks and strong backbone models.
Using MiniMax-2.5, TACO boosts benchmark performance on most tasks while cutting token overhead by about 10%.
On TerminalBench, it yields consistent 1%–4% gains across strong agentic models and improves accuracy by roughly 2%–3% under the same token budget, indicating good generalization of task-aware compression.

Abstract

As model capabilities advance, research has increasingly shifted toward long-horizon, multi-turn terminal-centric agentic tasks, where raw environment feedback is often preserved in the interaction history to support future decisions. However, repeatedly retaining such feedback introduces substantial redundancy and causes cumulative token cost to grow quadratically with the number of steps, hindering long-horizon reasoning. Although observation compression can mitigate this issue, the heterogeneity of terminal environments makes heuristic-based or fixed-prompt methods difficult to generalize. We propose TACO, a plug-and-play, self-evolving Terminal Agent Compression framework that automatically discovers and refines compression rules from interaction trajectories for existing terminal agents. Experiments on TerminalBench (TB 1.0 and TB 2.0) and four additional terminal-related benchmarks (i.e., SWE-Bench Lite, CompileBench, DevEval, and CRUST-Bench) show that TACO consistently improves performance across mainstream agent frameworks and strong backbone models. With MiniMax-2.5, it improves performance on most benchmarks while reducing token overhead by around 10%. On TerminalBench, it brings consistent gains of 1%-4% across strong agentic models, and further improves accuracy by around 2%-3% under the same token budget. These results demonstrate the effectiveness and generalization of self-evolving, task-aware compression for terminal agents.

Autoencoders and Representation Learning in Vision

Dev.to

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Dev.to

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Dev.to

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Dev.to

Now Meta will track what employees do on their computers to train its AI agents

The Verge

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

Key Points

Abstract

Related Articles

Autoencoders and Representation Learning in Vision

Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Now Meta will track what employees do on their computers to train its AI agents

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer