Semantic Denial of Service in LLM-controlled robots
arXiv cs.AI / 4/29/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper shows that LLM-based safety instruction-following for robots can introduce an availability vulnerability, allowing attackers to disrupt robot behavior without jailbreaking or overriding policies.
- By injecting very short, safety-plausible phrases (1–5 tokens) into a robot’s audio channel, an adversary can trigger the LLM’s safety reasoning to halt, delay, or otherwise disrupt execution.
- Across four vision-language models and multiple defenses/deployment settings, prompt-only defenses often reduce “hard-stop” attacks on some models but shift the failure into other disruption forms such as acknowledgement loops and false alerts, quantified via Disruption Success Rate (DSR).
- The study finds that varying the injected safety phrases is consistently more effective than repeating the same phrase, implying the models treat diverse safety cues as corroborating evidence.
- The authors argue the mitigation should be architectural: systems that route unauthenticated audio text directly into the LLM create an avoidable security dependency between safety monitoring and action selection.


