Robust Deepfake Detection: Mitigating Spatial Attention Drift via Calibrated Complementary Ensembles

arXiv cs.CV / 4/29/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

The paper highlights that deepfake detectors can achieve top results on clean datasets but fail in the real world due to spatial attention drift caused by compound degradations like blur and severe lossy compression.
It proposes a forensic “foundation-driven” framework that uses an extreme compound degradation engine and a structurally constrained, multi-stream architecture to learn more invariant geometric and semantic priors from DINOv2-Giant.
The method routes images through three pathways—Global Texture, Localized Facial, and Hybrid Semantic Fusion (with CLIP)—then evaluates spatial attribution stability using Score-CAM and feature stability via cosine similarity.
A calibrated, discretized voting ensemble is used to suppress background attention drift and improve robustness, with the approach reportedly achieving 4th place in the NTIRE 2026 Robust Deepfake Detection Challenge at CVPR.
The authors provide accompanying code on GitHub to support reproducibility.

Abstract

Current deepfake detection models achieve state-of-the-art performance on pristine academic datasets but suffer severe spatial attention drift under real-world compound degradations, such as blurring and severe lossy compression. To address this vulnerability, we propose a foundation-driven forensic framework that integrates an extreme compound degradation engine with a structurally constrained, multi-stream architecture. During training, our degradation pipeline systematically destroys high-frequency artifacts, optimizing the DINOv2-Giant backbone to extract invariant geometric and semantic priors. We then process images through three specialized pathways: a Global Texture stream, a Localized Facial stream, and a Hybrid Semantic Fusion stream incorporating CLIP. Through analyzing spatial attribution via Score-CAM and feature stability using Cosine Similarity, we quantitatively demonstrate that these streams extract non-redundant, complementary feature representations and stabilize attention entropy. By aggregating these predictions via a calibrated, discretized voting mechanism, our ensemble successfully suppresses background attention drift while acting as a robust geometric anchor. Our approach yields highly stable zero-shot generalization, achieving Fourth Place in the NTIRE 2026 Robust Deepfake Detection Challenge at CVPR. Code is available at https://github.com/khoalephanminh/ntire26-deepfake-challenge.

Black Hat USA

AI Business

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

Dev.to

An API testing tool built specifically for AI agent loops

Dev.to

IK_LLAMA now supports Qwen3.5 MTP Support :O

Reddit r/LocalLLaMA

OpenAI models, Codex, and Managed Agents come to AWS

Dev.to

Robust Deepfake Detection: Mitigating Spatial Attention Drift via Calibrated Complementary Ensembles

Key Points

Abstract

Related Articles

Black Hat USA

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

An API testing tool built specifically for AI agent loops

IK_LLAMA now supports Qwen3.5 MTP Support :O

OpenAI models, Codex, and Managed Agents come to AWS

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer