CivicShield: A Cross-Domain Defense-in-Depth Framework for Securing Government-Facing AI Chatbots Against Multi-Turn Adversarial Attacks
arXiv cs.AI / 4/1/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies major security shortcomings in government-facing LLM chatbots, noting that multi-turn adversarial attacks can exceed 90% success and commonly bypass single-layer guardrails.
- It proposes “CivicShield,” a defense-in-depth framework that combines seven layers spanning zero-trust capability access control, input validation, semantic intent filtering, conversation state machine invariants, anomaly detection, multi-model consensus, and graduated human escalation.
- The authors develop a formal threat model covering eight multi-turn attack families and map CivicShield to NIST SP 800-53 controls across 14 control families to support government compliance needs.
- Evaluation across 1,436 simulated scenarios using benchmarks such as HarmBench, JailbreakBench, and XSTest reports 72.9% combined detection with a 2.9% effective false positive rate, while preserving 100% detection for crescendo and slow-drift multi-turn attacks.
- Independent benchmark comparisons show reduced performance on real datasets versus author-generated scenarios, reinforcing the need for independently validated evaluation for practical deployment.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs
Dev.to

I Built an AI Agent That Can Write Its Own Tools When It Gets Stuck
Dev.to

Agent Self-Discovery: How AI Agents Find Their Own Wallets
Dev.to
[P] Federated Adversarial Learning
Reddit r/MachineLearning

The Inversion Error: Why Safe AGI Requires an Enactive Floor and State-Space Reversibility
Towards Data Science