VRAG-DFD: Verifiable Retrieval-Augmentation for MLLM-based Deepfake Detection

arXiv cs.CV / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces VRAG-DFD, a verifiable retrieval-augmentation framework for MLLM-based deepfake detection aimed at improving performance when professional forgery knowledge is scarce.
  • It combines Retrieval-Augmented Generation (RAG) with reinforcement learning to provide dynamically retrieved forgery knowledge and to support more critical reasoning under noisy references.
  • The authors build two RAG-focused datasets—FKD for forgery knowledge annotation and F-CoT for constructing chain-of-thought—so the model can learn forensic knowledge and reasoning traces.
  • Training uses a three-stage pipeline (Alignment → SFT → GRPO) designed to progressively develop the model’s critical reasoning capabilities.
  • Experiments report state-of-the-art and competitive results on deepfake detection generalization tests, suggesting improved robustness beyond static knowledge injection approaches.

Abstract

In Deepfake Detection (DFD) tasks, researchers proposed two types of MLLM-based methods: complementary combination with small DFD detectors, or static forgery knowledge injection.The lack of professional forgery knowledge hinders the performance of these DFD-MLLMs.To solve this, we deeply considered two insightful issues: How to provide high-quality associated forgery knowledge for MLLMs? AND How to endow MLLMs with critical reasoning abilities given noisy reference information? Notably, we attempted to address above two questions with preliminary answers by leveraging the combination of Retrieval-Augmented Generation (RAG) and Reinforcement Learning (RL).Through RAG and RL techniques, we propose the VRAG-DFD framework with accurate dynamic forgery knowledge retrieval and powerful critical reasoning capabilities.Specifically, in terms of data, we constructed two datasets with RAG: Forensic Knowledge Database (FKD) for DFD knowledge annotation, and Forensic Chain-of-Thought Dataset (F-CoT), for critical CoT construction.In terms of model training, we adopt a three-stage training method (Alignment->SFT->GRPO) to gradually cultivate the critical reasoning ability of the MLLM.In terms of performance, VRAG-DFD achieved SOTA and competitive performance on DFD generalization testing.