SHIFT: Stochastic Hidden-Trajectory Deflection for Removing Diffusion-based Watermark

arXiv cs.CV / 4/1/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces SHIFT (Stochastic Hidden-Trajectory Deflection), a training-free attack against diffusion-based watermarking that targets the verifier’s reliance on reconstructing the diffusion trajectory.
  • SHIFT uses stochastic diffusion resampling to deflect the generative trajectory in latent space so the reconstructed image becomes statistically decoupled from the watermark-embedded trajectory.
  • The attack is designed to preserve visual quality and semantic/meaning consistency while disrupting verification.
  • Experiments across nine watermarking approaches (covering noise-space, frequency-domain, and optimization-based methods) report 95%–100% attack success with nearly no semantic quality loss.
  • The method does not require watermark-specific knowledge or any model retraining, making the vulnerability broadly exploitable across paradigms.

Abstract

Diffusion-based watermarking methods embed verifiable marks by manipulating the initial noise or the reverse diffusion trajectory. However, these methods share a critical assumption: verification can succeed only if the diffusion trajectory can be faithfully reconstructed. This reliance on trajectory recovery constitutes a fundamental and exploitable vulnerability. We propose \underline{\mathbf{S}}tochastic \underline{\mathbf{Hi}}dden-Trajectory De\underline{\mathbf{f}}lec\underline{\mathbf{t}}ion (\mathbf{SHIFT}), a training-free attack that exploits this common weakness across diverse watermarking paradigms. SHIFT leverages stochastic diffusion resampling to deflect the generative trajectory in latent space, making the reconstructed image statistically decoupled from the original watermark-embedded trajectory while preserving strong visual quality and semantic consistency. Extensive experiments on nine representative watermarking methods spanning noise-space, frequency-domain, and optimization-based paradigms show that SHIFT achieves 95%--100% attack success rates with nearly no loss in semantic quality, without requiring any watermark-specific knowledge or model retraining.