TwoHamsters: Benchmarking Multi-Concept Compositional Unsafety in Text-to-Image Models
arXiv cs.CV / 4/20/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies a new text-to-image safety vulnerability called Multi-Concept Compositional Unsafety (MCCU), where harmful meaning can emerge from implicit associations between individually benign concepts.
- It introduces TwoHamsters, a benchmark of 17.5k prompts designed specifically to test for MCCU risks.
- Evaluations across 10 state-of-the-art text-to-image models and 16 defense methods show that both models and defenses can fail badly under MCCU.
- The results include FLUX reaching a 99.52% MCCU generation success rate and LLaVA-Guard showing only 41.06% recall, underscoring a major gap in existing safety approaches.
- The study provides 8 key insights intended to guide more effective defenses for compositional, semantics-driven unsafe generation in T2I systems.
Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)
Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI
Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else
Dev.to
Local LLM Beginner’s Guide (Mac - Apple Silicon)
Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals
Dev.to