TwinGate: Stateful Defense against Decompositional Jailbreaks in Untraceable Traffic via Asymmetric Contrastive Learning
arXiv cs.CL / 5/1/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper introduces TwinGate, a stateful defense framework designed to detect decompositional jailbreak attempts against LLMs in realistic settings with anonymized, untraceable, and interleaved traffic.
- TwinGate uses Asymmetric Contrastive Learning with a dual-encoder design to cluster semantically different but intent-matched malicious query fragments, while a parallel frozen encoder reduces false positives from benign topical similarity.
- The method is built for deployment efficiency: each request needs only a single lightweight forward pass and can run in parallel with the LLM’s prefill stage to keep latency overhead negligible.
- To support evaluation, the authors release a large dataset of 3.62M+ instructions covering 8,600 distinct malicious intents and test TwinGate under a strictly causal protocol.
- Results indicate TwinGate achieves strong malicious-intent recall with a low false-positive rate, remains robust to adaptive attacks, and outperforms both stateful and stateless baselines on throughput and latency.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Why Autonomous Coding Agents Keep Failing — And What Actually Works
Dev.to

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!
Reddit r/artificial

Automating FDA Compliance: AI for Specialty Food Producers
Dev.to

Mistral's new flagship Medium 3.5 folds chat, reasoning, and code into one model
THE DECODER
I hate this group but not literally
Reddit r/LocalLLaMA