Robust Deepfake Detection: Mitigating Spatial Attention Drift via Calibrated Complementary Ensembles
arXiv cs.CV / 4/29/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- The paper highlights that deepfake detectors can achieve top results on clean datasets but fail in the real world due to spatial attention drift caused by compound degradations like blur and severe lossy compression.
- It proposes a forensic “foundation-driven” framework that uses an extreme compound degradation engine and a structurally constrained, multi-stream architecture to learn more invariant geometric and semantic priors from DINOv2-Giant.
- The method routes images through three pathways—Global Texture, Localized Facial, and Hybrid Semantic Fusion (with CLIP)—then evaluates spatial attribution stability using Score-CAM and feature stability via cosine similarity.
- A calibrated, discretized voting ensemble is used to suppress background attention drift and improve robustness, with the approach reportedly achieving 4th place in the NTIRE 2026 Robust Deepfake Detection Challenge at CVPR.
- The authors provide accompanying code on GitHub to support reproducibility.
