Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation
arXiv cs.AI / 4/22/2026
💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that existing RAG jamming attacks mainly cause obvious denials or explicit refusals, and instead formalizes a subtler “soft failure” threat that keeps responses fluent but uninformative.
- It introduces DEJA, a black-box automated framework that creates adversarial documents to trigger soft failures by leveraging safety-aligned behaviors in large language models.
- DEJA uses evolutionary optimization guided by an LLM-based Answer Utility Score to reduce answer certainty while preserving high retrieval success.
- Experiments across multiple RAG setups and benchmark datasets show DEJA achieves high soft-failure success (SASR > 79%) while keeping hard-failure rates low (< 15%), outperforming prior methods.
- The adversarial documents are designed to be stealthy—evading perplexity-based detection, resisting query paraphrasing, and transferring to proprietary models without retargeting.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles
Autoencoders and Representation Learning in Vision
Dev.to
Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.
Dev.to
Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful
Dev.to
Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks
Dev.to

Now Meta will track what employees do on their computers to train its AI agents
The Verge