Neighbor-Aware Localized Concept Erasure in Text-to-Image Diffusion Models

arXiv cs.CV / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses “concept erasure” in text-to-image diffusion models, where an unwanted concept is removed while preserving overall generation quality.
  • It argues that existing localized erasure can unintentionally weaken semantically related neighboring concepts, hurting fidelity on fine-grained categories.
  • The proposed Neighbor-Aware Localized Concept Erasure (NLCE) is a training-free, three-stage method that suppresses target concept embeddings, uses attention to find residual activation regions, and then applies spatially gated hard erasure only where needed.
  • Experiments on Oxford Flowers and Stanford Dogs show NLCE removes the target concept more effectively while better preserving closely related neighboring categories.
  • Additional tests indicate robustness and generalization across broader erasure scenarios, including celebrity identity, explicit content, and artistic style.

Abstract

Concept erasure in text-to-image diffusion models seeks to remove undesired concepts while preserving overall generative capability. Localized erasure methods aim to restrict edits to the spatial region occupied by the target concept. However, we observe that suppressing a concept can unintentionally weaken semantically related neighbor concepts, reducing fidelity in fine-grained domains. We propose Neighbor-Aware Localized Concept Erasure (NLCE), a training-free framework designed to better preserve neighboring concepts while removing target concepts. It operates in three stages: (1) a spectrally-weighted embedding modulation that attenuates target concept directions while stabilizing neighbor concept representations, (2) an attention-guided spatial gate that identifies regions exhibiting residual concept activation, and (3) a spatially-gated hard erasure that eliminates remaining traces only where necessary. This neighbor-aware pipeline enables localized concept removal while maintaining the surrounding concept neighborhood structure. Experiments on fine-grained datasets (Oxford Flowers, Stanford Dogs) show that our method effectively removes target concepts while better preserving closely related categories. Additional results on celebrity identity, explicit content and artistic style demonstrate robustness and generalization to broader erasure scenarios.