Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility

arXiv cs.AI / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a key limitation of current AI-agent safety approaches: while training-based and neural guardrails improve reliability, they do not provide formal guarantees in high-stakes business domains where tool-using agents may cause privacy or financial harm.
  • It proposes symbolic guardrails as a practical way to deliver stronger safety and security guarantees for domain-specific agents, including a systematic review of 80 state-of-the-art agent safety/security benchmarks.
  • The authors analyze which policy requirements are actually enforceable via symbolic guardrails and evaluate the impact on safety, security, and task success across τ²-Bench, CAR-bench, and MedAgentBench.
  • They find that 85% of the surveyed benchmarks either do not specify concrete policies (instead using underspecified high-level goals) or are not enforceable as written, but among specified policies, 74% of the policy requirements can be enforced with symbolic guardrails using relatively simple, low-cost mechanisms.
  • The study reports that symbolic guardrails improve safety and security without reducing agent utility, and the authors release code and artifacts publicly.

Abstract

AI agents that interact with their environments through tools enable powerful applications, but in high-stakes business settings, unintended actions can cause unacceptable harm, such as privacy breaches and financial loss. Existing mitigations, such as training-based methods and neural guardrails, improve agent reliability but cannot provide guarantees. We study symbolic guardrails as a practical path toward strong safety and security guarantees for AI agents. Our three-part study includes a systematic review of 80 state-of-the-art agent safety and security benchmarks to identify the policies they evaluate, an analysis of which policy requirements can be guaranteed by symbolic guardrails, and an evaluation of how symbolic guardrails affect safety, security, and agent success on \tau^2-Bench, CAR-bench, and MedAgentBench. We find that 85\% of benchmarks lack concrete policies, relying instead on underspecified high-level goals or common sense. Among the specified policies, 74\% of policy requirements can be enforced by symbolic guardrails, often using simple, low-cost mechanisms. These guardrails improve safety and security without sacrificing agent utility. Overall, our results suggest that symbolic guardrails are a practical and effective way to guarantee some safety and security requirements, especially for domain-specific AI agents. We release all codes and artifacts at https://github.com/hyn0027/agent-symbolic-guardrails.