Z-Erase: Enabling Concept Erasure in Single-Stream Diffusion Transformers

arXiv cs.CV / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Z-Erase is presented as the first concept erasure method specifically designed for single-stream diffusion transformers used in text-to-image generation.
  • The paper argues that directly reusing prior concept-erasure techniques from U-Net or dual-stream models can cause generation collapse in single-stream architectures, motivating a new framework.
  • It introduces a Stream Disentangled Concept Erasure Framework that decouples updates to make erasure feasible without destabilizing image generation.
  • Z-Erase also proposes Lagrangian-Guided Adaptive Erasure Modulation, which uses a constrained optimization approach to balance removing unwanted concepts while preserving overall generation quality.
  • Experiments report state-of-the-art performance across multiple tasks, and the paper includes convergence analysis showing the method can converge to a Pareto stationary point.

Abstract

Concept erasure serves as a vital safety mechanism for removing unwanted concepts from text-to-image (T2I) models. While extensively studied in U-Net and dual-stream architectures (e.g., Flux), this task remains under-explored in the recent emerging paradigm of single-stream diffusion transformers (e.g., Z-Image). In this new paradigm, text and image tokens are processed as a single unified sequence via shared parameters. Consequently, directly applying prior erasure methods typically leads to generation collapse. To bridge this gap, we introduce Z-Erase, the first concept erasure method tailored for single-stream T2I models. To guarantee stable image generation, Z-Erase first proposes a Stream Disentangled Concept Erasure Framework that decouples updates and enables existing methods on single-stream models. Subsequently, within this framework, we introduce Lagrangian-Guided Adaptive Erasure Modulation, a constrained algorithm that further balances the sensitive erasure-preservation trade-off. Moreover, we provide a rigorous convergence analysis proving that Z-Erase can converge to a Pareto stationary point. Experiments demonstrate that Z-Erase successfully overcomes the generation collapse issue, achieving state-of-the-art performance across a wide range of tasks.