Structured Semantic Cloaking for Jailbreak Attacks on Large Language Models
arXiv cs.CL / 3/18/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- S2C is a novel multi-dimensional jailbreak framework designed to manipulate how malicious semantic intent is reconstructed during LLM inference to bypass safety mechanisms.
- It combines three mechanisms: Contextual Reframing, Content Fragmentation, and Clue-Guided Camouflage to delay semantic consolidation and degrade safety triggers while preserving some output recoverability.
- The authors evaluate S2C across multiple open-source and proprietary LLMs using HarmBench and JBB-Behaviors, reporting Attack Success Rate (ASR) improvements of 12.4% and 9.7% over the current state-of-the-art, with GPT-5-mini showing a 26% gain on JBB-Behaviors.
- The study analyzes which model combinations perform best against broad model families and discusses trade-offs between the extent of obfuscation and input recoverability.




