DVAR: Adversarial Multi-Agent Debate for Video Authenticity Detection

arXiv cs.CV / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces DVAR, a training-free framework for video authenticity detection designed to generalize beyond a model’s training distribution as video generation technology rapidly evolves.
  • DVAR reframes detection as structured multi-agent forensic reasoning by running a debate between a Generative Hypothesis Agent and a Natural Mechanism Agent through iterative cross-examination rounds.
  • It adjudicates competing explanations using Occam’s Razor via a Minimum Description Length (MDL) approach, assigning an “Explanatory Cost” to measure the logical burden of each reasoning path.
  • The method also leverages GenVideoKB, a dynamic knowledge repository with heuristics about generative boundaries and common failure modes to guide agents’ reasoning.
  • Experiments on authenticity detection show DVAR is competitive with supervised state-of-the-art methods while achieving better generalization to previously unseen generative architectures and producing interpretable reasoning traces.

Abstract

The rapid evolution of video generation technologies poses a significant challenge to media forensics, as conventional detection methods often fail to generalize beyond their training distributions. To address this, we propose DVAR (Debate-based Video Authenticity Reasoning), a training-free framework that reformulates video detection as a structured multi-agent forensic reasoning process. Moving beyond the paradigm of pattern matching, DVAR orchestrates a competition between a Generative Hypothesis Agent and a Natural Mechanism Agent. Through iterative rounds of cross-examination, these agents defend their respective explanations against abnormal evidence, driving a logical convergence where the truth emerges from rigorous stress-testing. To adjudicate these conflicting claims, we apply Occam's Razor through the Minimum Description Length (MDL) framework, defining an Explanatory Cost to quantify the "logical burden" of each reasoning path. Furthermore, we integrate GenVideoKB, a dynamic knowledge repository that provides high-level reasoning heuristics on generative boundaries and failure modes. Extensive experiments demonstrate that DVAR achieves competitive performance against supervised state-of-the-art methods while exhibiting superior generalization to unseen generative architectures. By transforming detection into a transparent debate, DVAR provides explicit, interpretable reasoning traces for robust video authenticity assessment.