Deepfake Detection Generalization with Diffusion Noise

arXiv cs.CV / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tackles the problem of deepfake detectors failing to generalize as new synthesis methods—especially diffusion-based deepfakes—emerge.
  • It introduces an Attention-guided Noise Learning (ANL) framework that plugs a pre-trained diffusion model into the detection pipeline to learn robust features by predicting diffusion-step noise.
  • The detector is trained to capture subtle discrepancies between real and synthetic images using the diffusion denoising process, while an attention-guided mechanism encourages focus on globally distributed differences rather than local artifacts.
  • Experiments on multiple benchmarks show ANL substantially improves detection accuracy, reaching state-of-the-art results for identifying diffusion-generated deepfakes, with gains in unseen model generalization and no added inference overhead.
  • Overall, the work argues that diffusion noise characteristics can serve as a strong regularization signal for more generalizable deepfake detection.

Abstract

Deepfake detectors face growing challenges in generalization as new image synthesis techniques emerge. In particular, deepfakes generated by diffusion models are highly photorealistic and often evade detectors trained on GAN-based forgeries. This paper addresses the generalization problem in deepfake detection by leveraging diffusion noise characteristics. We propose an Attention-guided Noise Learning (ANL) framework that integrates a pre-trained diffusion model into the deepfake detection pipeline to guide the learning of more robust features. Specifically, our method uses the diffusion model's denoising process to expose subtle artifacts: the detector is trained to predict the noise contained in an input image at a given diffusion step, forcing it to capture discrepancies between real and synthetic images, while an attention-guided mechanism derived from the predicted noise is introduced to encourage the model to focus on globally distributed discrepancies rather than local patterns. By harnessing the frozen diffusion model's learned distribution of natural images, the ANL method acts as a form of regularization, improving the detector's generalization to unseen forgery types. Extensive experiments demonstrate that ANL significantly outperforms existing methods on multiple benchmarks, achieving state-of-the-art accuracy in detecting diffusion-generated deepfakes. Notably, the proposed framework boosts generalization performance (e.g., improving ACC/AP by a substantial margin on unseen models) without introducing additional overhead during inference. Our results highlight that diffusion noise provides a powerful signal for generalizable deepfake detection.