Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks

arXiv cs.AI / 4/25/2026

💬 OpinionDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper highlights that RAG systems used in sensitive domains (e.g., healthcare and law) face security risks such as membership inference, data poisoning, and unintended content leakage.
It finds that enabling a full, always-on defense stack can severely hurt RAG utility, with experiments showing retrieval contextual recall drops by over 40% because retrieval degradation is a primary failure mode.
To address this security–utility trade-off, the authors propose the Sentinel-Strategist (ADO) architecture, where a Sentinel detects anomalous retrieval behavior and a Strategist selects defenses contextually for each query.
Across three benchmark datasets and five orchestration models, ADO largely eliminates MBA-style membership-inference leakage while recovering retrieval utility close to an undefended baseline, and under data poisoning it drives attack success to near zero while restoring recall to over 75%—though performance is sensitive to the chosen model.
Overall, the results suggest that adaptive, query-aware defense orchestration can substantially improve robustness without paying the heavy utility costs of static defenses.

Abstract

Retrieval-augmented generation (RAG) systems are increasingly deployed in sensitive domains such as healthcare and law, where they rely on private, domain-specific knowledge. This capability introduces significant security risks, including membership inference, data poisoning, and unintended content leakage. A straightforward mitigation is to enable all relevant defenses simultaneously, but doing so incurs a substantial utility cost. In our experiments, an always-on defense stack reduces contextual recall by more than 40%, indicating that retrieval degradation is the primary failure mode. To mitigate this trade-off in RAG systems, we propose the Sentinel-Strategist architecture, a context-aware framework for risk analysis and defense selection. A Sentinel detects anomalous retrieval behavior, after which a Strategist selectively deploys only the defenses warranted by the query context. Evaluated across three benchmark datasets and five orchestration models, ADO is shown to eliminate MBA-style membership inference leakage while substantially recovering retrieval utility relative to a fully static defense stack, approaching undefended baseline levels. Under data poisoning, the strongest ADO variants reduce attack success to near zero while restoring contextual recall to more than 75% of the undefended baseline, although robustness remains sensitive to model choice. Overall, these findings show that adaptive, query-aware defense can substantially reduce the security-utility trade-off in RAG systems.