Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents

arXiv cs.AI / 4/30/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes “codename,” a telemetry-driven behavioral firewall that detects and blocks anomalous tool-call sequences for structured-workflow LLM agents operating on sensitive external environments.
  • It compiles verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA), defining allowed tool sequences, sequential contexts, and parameter bounds, so enforcement becomes an efficient runtime state-transition lookup.
  • On the Agent Security Bench (ASB), codename reduces attack success rate to 5.6% macro-average across five scenarios, and within three structured workflows to 2.2%, outperforming a state-of-the-art stateless scanner (12.8% ASR).
  • The system achieves 0% ASR on multi-step and context-sequential attacks in structured settings and allows only a small fraction of exfiltration payloads to match valid structural paths, with all surviving paths failing end-to-end parameter guards.
  • Runtime overhead is low (2.2 ms per tool call) with a 2.0% benign task failure rate, but the authors note that unmaintained parameter bounds can be evaded via synonym substitution, emphasizing exact-match whitelisting of sensitive parameters as the final safeguard.

Abstract

Structured-workflow agents driven by large language models execute tool calls against sensitive external environments. We propose \codename, a telemetry-driven behavioral anomaly detection firewall. Drawing on sequence-based intrusion detection, \codename\ compiles verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA). The model defines permitted tool sequences, sequential contexts, and parameter bounds. At runtime, a lightweight gateway enforces these boundaries via an O(1) state-transition structural lookup, shifting computationally expensive analysis entirely offline. Evaluated on the Agent Security Bench (ASB), \codename\ achieves a 5.6\% macro-averaged attack success rate (ASR) across five scenarios. Within three structured workflows, ASR drops to 2.2\%, outperforming Aegis, a state-of-the-art stateless scanner, at 12.8\%. \codename\ achieves 0\% ASR on multi-step and context-sequential attacks in structured settings. Furthermore, against 1,000 algorithmically spliced exfiltration payloads, only 1.4\% matched valid structural paths, all of which failed end-to-end string parameter guards (0 successes out of 14 surviving paths, 95\% CI [0\%, 23.2\%]). \codename\ introduces just 2.2~ms of per-call latency (a 3.7\times speedup over \textsc{Aegis}) while maintaining a 2.0\% benign task failure rate (BTFR) on benign workloads. Modeling the behavioral trajectory effectively collapses the available attack surface, but unmaintained continuous parameter bounds remain vulnerable to synonym-substitution attacks (18\% evasion rate). Thus, exact-match whitelisting of sensitive parameters ultimately bears the final defensive load against execution.