Dynamic Eraser for Guided Concept Erasure in Diffusion Models

arXiv cs.CV / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Dynamic Semantic Steering (DSS), a training-free inference method to erase specific concepts in text-to-image diffusion models safely.
  • DSS combines Sensitive Semantic Boundary Modeling (SSBM) to automatically find “safe” semantic anchors and Sensitive Semantic Guidance (SSG) that uses cross-attention to detect sensitive content and apply a closed-form correction.
  • The authors argue DSS avoids common failure modes of prior work, such as over-correction, semantic drift, and even representation collapse.
  • Experiments report an average erasure rate of 91.0%, outperforming prior state-of-the-art methods (improving from 18.6% to 85.9%) while causing minimal degradation in output fidelity.
  • Overall, the approach aims to provide interpretable, controllable, and more reliable concept suppression compared with token-level or feature-correction baselines.

Abstract

Concept erasure in Text-To-Image (T2I) diffusion models is vital for safe content generation, but existing inference-time methods face significant limitations. Feature-correction approaches often cause uncontrolled over-correction, while token-level interventions struggle with semantic granularity and context. Moreover, both types of methods are prone to severe semantic drift or even complete representation collapse. To address these challenges, we present Dynamic Semantic Steering (DSS), a lightweight, training-free framework for interpretable and controllable concept erasure. DSS introduces: 1) Sensitive Semantic Boundary Modeling (SSBM) to automate the discovery of safe semantic anchors, and 2) Sensitive Semantic Guidance (SSG), which leverages cross-attention features for precise detection and performs correction via a closed-form solution derived from a well-posed objective. This ensures optimal suppression of sensitive content while preserving benign semantics. DSS achieves an average erasure rate of 91.0\%, significantly outperforming SOTA methods (from 18.6\% to 85.9\%) with minimal impact on output fidelity.