Enhancing Self-Supervised Talking Head Forgery Detection via a Training-Free Dual-System Framework

arXiv cs.CV / 5/6/2026

📰 NewsModels & Research

Key Points

  • Supervised talking head forgery detectors struggle to generalize as generators evolve, so the paper focuses on self-supervised approaches for better cross-generator robustness.
  • It argues that current score-based self-supervised detectors do not fully exploit their discriminative power, especially on hard cases where anomaly ordering can be unreliable.
  • The authors propose a Training-Free Dual-System (TFDS) framework that first uses lightweight, threshold-based routing to separate confident vs. uncertain samples.
  • System-2 then re-examines only the uncertain subset with evidence-guided, fine-grained reasoning to correct the relative ordering of ambiguous cases, yielding consistent improvements across datasets and perturbation settings.
  • The improvements primarily come from better anomaly ordering within the uncertain subset, suggesting existing detectors already contain useful cues that can be unlocked without additional training.

Abstract

Supervised talking head forgery detection faces severe generalization challenges due to the continuous evolution of generators. By reducing reliance on generator-specific forgery patterns, self-supervised detectors offer stronger cross-generator robustness. However, existing research has mainly focused on building stronger detectors, while the discriminative capacity of trained detectors remains insufficiently exploited. In particular, for score-based self-supervised detectors, the limited discriminative ability on hard cases is often reflected in unreliable anomaly ordering, leaving room for further refinement. Motivated by this observation, we draw inspiration from the dual-system theory of human cognition and propose a Training-Free Dual-System (TFDS) framework to further exploit the latent discriminative capacity of existing score-based self-supervised detectors. TFDS treats anomaly-like scores as the basis of System-1, using lightweight threshold-based routing to partition samples into confident and uncertain subsets. System-2 then revisits only the uncertain subset, performing fine-grained evidence-guided reasoning to refine the relative ordering of ambiguous samples within the original score distribution. Extensive experiments demonstrate consistent improvements across datasets and perturbation settings, with the gains arising mainly from corrected ordering within the uncertain subset. These findings show that existing self-supervised talking head forgery detectors still contain underexploited discriminative cues that can be effectively unlocked through training-free dual-system reasoning.