SafeRoPE: Risk-specific Head-wise Embedding Rotation for Safe Generation in Rectified Flow Transformers
arXiv cs.CV / 4/3/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper analyzes rectified-flow transformer text-to-image models (e.g., MMDiT) and shows that unsafe semantics are concentrated in identifiable, low-dimensional attention subspaces within a small set of safety-critical heads.
- It introduces SafeRoPE, which uses head-wise decomposition of unsafe embeddings to compute a Latent Risk Score (LRS) by projecting input vectors onto these unsafe subspaces.
- SafeRoPE applies targeted, head-wise perturbations to Rotary Positional Embedding (RoPE) on query/key vectors to suppress unsafe concepts while preserving benign content and overall image quality.
- By combining LRS-guided risk estimation with RoPE-based risk-specific rotation, SafeRoPE provides lightweight, fine-grained safety mitigation without the costly fine-tuning or attention-modulation approaches that are hard to adapt to transformer-based diffusion models.
- The authors report extensive experiments achieving state-of-the-art trade-offs between harmful-content mitigation and utility preservation for safe generation in MMDiT, and they release code on GitHub.
Related Articles

Black Hat Asia
AI Business

Mistral raises $830M, 9fin hits unicorn status, and new Tech.eu Summit speakers unveiled
Tech.eu

ChatGPT costs $20/month. I built an alternative for $2.99.
Dev.to

OpenAI shifts to usage-based pricing for Codex in ChatGPT business plans
THE DECODER

Why I built an AI assistant that doesn't know who you are
Dev.to