Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

arXiv cs.AI / 4/30/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes “codename,” a telemetry-driven behavioral firewall that detects and blocks anomalous tool-call sequences for structured-workflow LLM agents operating on sensitive external environments.
It compiles verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA), defining allowed tool sequences, sequential contexts, and parameter bounds, so enforcement becomes an efficient runtime state-transition lookup.
On the Agent Security Bench (ASB), codename reduces attack success rate to 5.6% macro-average across five scenarios, and within three structured workflows to 2.2%, outperforming a state-of-the-art stateless scanner (12.8% ASR).
The system achieves 0% ASR on multi-step and context-sequential attacks in structured settings and allows only a small fraction of exfiltration payloads to match valid structural paths, with all surviving paths failing end-to-end parameter guards.
Runtime overhead is low (2.2 ms per tool call) with a 2.0% benign task failure rate, but the authors note that unmaintained parameter bounds can be evaded via synonym substitution, emphasizing exact-match whitelisting of sensitive parameters as the final safeguard.

Abstract

Structured-workflow agents driven by large language models execute tool calls against sensitive external environments. We propose \codename, a telemetry-driven behavioral anomaly detection firewall. Drawing on sequence-based intrusion detection, \codename\ compiles verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA). The model defines permitted tool sequences, sequential contexts, and parameter bounds. At runtime, a lightweight gateway enforces these boundaries via an

O(1)

state-transition structural lookup, shifting computationally expensive analysis entirely offline. Evaluated on the Agent Security Bench (ASB), \codename\ achieves a 5.6\% macro-averaged attack success rate (ASR) across five scenarios. Within three structured workflows, ASR drops to 2.2\%, outperforming Aegis, a state-of-the-art stateless scanner, at 12.8\%. \codename\ achieves 0\% ASR on multi-step and context-sequential attacks in structured settings. Furthermore, against 1,000 algorithmically spliced exfiltration payloads, only 1.4\% matched valid structural paths, all of which failed end-to-end string parameter guards (0 successes out of 14 surviving paths, 95\% CI [0\%, 23.2\%]). \codename\ introduces just 2.2~ms of per-call latency (a 3.7

\times

speedup over \textsc{Aegis}) while maintaining a 2.0\% benign task failure rate (BTFR) on benign workloads. Modeling the behavioral trajectory effectively collapses the available attack surface, but unmaintained continuous parameter bounds remain vulnerable to synonym-substitution attacks (18\% evasion rate). Thus, exact-match whitelisting of sensitive parameters ultimately bears the final defensive load against execution.

Claude Opus 4.7: What Actually Changed and Whether You Should Migrate

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

Sector HQ Daily AI Intelligence - April 30, 2026

Dev.to

The Inference Inflection: Why AI's Center of Gravity Has Shifted from Training to Inference

Dev.to

AI transparency index on pvgomes.com

Dev.to

Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

Key Points

Abstract

Related Articles

Claude Opus 4.7: What Actually Changed and Whether You Should Migrate

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Sector HQ Daily AI Intelligence - April 30, 2026

The Inference Inflection: Why AI's Center of Gravity Has Shifted from Training to Inference

AI transparency index on pvgomes.com

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer