Multi-Stream Perturbation Attack: Breaking Safety Alignment of Thinking LLMs Through Concurrent Task Interference
arXiv cs.AI / 3/12/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies vulnerabilities in the thinking mode of LLMs when processing interleaved multiple tasks, highlighting new safety risks.
- It introduces the multi-stream perturbation attack, which interleaves multiple task streams within a single prompt to create interference, along with three perturbation strategies: multi-stream interleaving, inversion perturbation, and shape transformation.
- Experiments on JailbreakBench, AdvBench, and HarmBench show the attack achieving high success rates across models such as Qwen3 series, DeepSeek, Qwen3-Max, and Gemini 2.5 Flash, with thinking collapse up to 17% and response repetition up to 60%.
- The results indicate that thinking-mode based safety mechanisms can be bypassed and that concurrent task interference can degrade model thinking, underscoring safety implications for current and future LLM deployments.
Related Articles
[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning
Reddit r/MachineLearning
How I Built an AI SDR Agent That Finds Leads and Writes Personalized Cold Emails
Dev.to
Complete Guide: How To Make Money With Ai
Dev.to
I Analyzed My Portfolio with AI and Scored 53/100 — Here's How I Fixed It to 85+
Dev.to
The Demethylation
Dev.to