LinuxArena: A Control Setting for AI Agents in Live Production Software Environments

arXiv cs.AI / 4/20/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces LinuxArena, a new control environment where AI agents operate on live, multi-service production-like software setups, including both legitimate engineering tasks and safety-failure side tasks.
LinuxArena includes 20 environments, 1,671 main tasks, and 184 side tasks (e.g., data exfiltration and backdooring), making it a highly diverse and large testbed for software-engineering control research.
The researchers evaluate sabotage and monitoring by testing whether attackers can perform side tasks while completing main tasks, and whether a monitor model can detect sabotage attempts.
Using a trusted GPT-5-nano monitor with a 1% step-wise false positive rate, Claude Opus 4.6 still achieves about a 23% undetected sabotage success rate, indicating monitoring gaps under current conditions.
They also release LaStraj, a human-crafted dataset of attack trajectories that bypass monitors at higher rates than model-generated attacks, suggesting existing attack policies do not fully cover the challenge space.

Abstract

We introduce LinuxArena, a control setting in which agents operate directly on live, multi-service production environments. LinuxArena contains 20 environments, 1,671 main tasks representing legitimate software engineering work, and 184 side tasks representing safety failures such as data exfiltration and backdooring, making it the largest and most diverse control setting for software engineering to date. We validate LinuxArena is useful for control research by running sabotage evaluations, which measure whether attackers can complete side tasks while working on main tasks, and monitor evaluations, which measure a monitor model's ability to detect sabotage attempts. Against a GPT-5-nano trusted monitor at a 1\% step-wise false positive rate, Claude Opus 4.6 achieves roughly a 23% undetected sabotage success rate. We additionally release LaStraj, a dataset of human-crafted attack trajectories that evade monitors at substantially higher rates than any model-generated attacks we elicited, showing that current attack policies do not saturate LinuxArena. These results suggest that LinuxArena has meaningful headroom for both attackers and defenders, making it a strong testbed for developing and evaluating future control protocols.

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Dev.to

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

Dev.to

LinuxArena: A Control Setting for AI Agents in Live Production Software Environments

Key Points

Abstract

Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer