Semantic Chameleon: Corpus-Dependent Poisoning Attacks and Defenses in RAG Systems
arXiv cs.AI / 3/20/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper examines gradient-guided corpus poisoning attacks in Retrieval-Augmented Generation (RAG) systems, showing attackers can manipulate the retrieval corpus to bias model outputs.
- It introduces dual-document poisoning (a sleeper document and a trigger document) optimized with Greedy Coordinate Gradient, achieving a 38.0 percent co-retrieval rate under pure vector retrieval on a 67,941-document Security Stack Exchange corpus across 50 attack attempts.
- A simple defense—hybrid retrieval combining BM25 and vector similarity—greatly reduces attack success, lowering it from 38% to 0% without modifying the LLM or retraining the retriever; attackers can still partially circumvent if payloads target both sparse and dense signals.
- Cross-model evaluation across GPT-5.3, GPT-4o, Claude Sonnet 4.6, Llama 4, and GPT-4o-mini shows attack success ranging from 46.7% to 93.3%, while cross-corpus FEVER experiments yield 0% success across configurations, indicating the defense can be robust but dataset- and model-dependent.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles
The massive shift toward edge computing and local processing
Dev.to
Self-Refining Agents in Spec-Driven Development
Dev.to
Week 3: Why I'm Learning 'Boring' ML Before Building with LLMs
Dev.to
The Three-Agent Protocol Is Transferable. The Discipline Isn't.
Dev.to

has anyone tried this? Flash-MoE: Running a 397B Parameter Model on a Laptop
Reddit r/LocalLLaMA