Every Picture Tells a Dangerous Story: Memory-Augmented Multi-Agent Jailbreak Attacks on VLMs
arXiv cs.AI / 4/15/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that the growing capabilities of vision-language models (VLMs) have expanded their adversarial attack surface beyond superficial pixel/typographic attacks, leaving natural-image semantic vulnerabilities underexplored.
- It introduces MemJack, a memory-augmented multi-agent jailbreak framework that maps visual entities to malicious intents, crafts adversarial prompts using multi-angle visual-semantic camouflage, and applies an Iterative Nullspace Projection filter to evade latent-space refusal mechanisms.
- MemJack maintains coherent multi-turn jailbreak interactions across different images by storing and transferring successful strategies in a persistent multimodal experience memory, improving generalization to new images.
- Experiments on full, unmodified COCO val2017 images report a 71.48% attack success rate against Qwen3-VL-Plus, reaching about 90% with extended compute budgets.
- To support defense research, the authors plan to release MemJack-Bench, a dataset of 113,000+ interactive multimodal jailbreak trajectories for studying and aligning more robust VLMs.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business

The Complete Guide to Better Meeting Productivity with AI Note-Taking
Dev.to

5 Ways Real-Time AI Can Boost Your Sales Call Performance
Dev.to

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG
Dev.to
Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]
Reddit r/MachineLearning