Attribution-Guided Multimodal Deepfake Detection via Cross-Modal Forensic Fingerprints
arXiv cs.CV / 4/30/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that audio-visual deepfake detectors based on simple binary classification often pick up dataset-specific artifacts rather than true generator forensic traces, limiting robustness.
- It introduces the AMDD framework, which performs both detection and generator attribution by using attribution-guided learning as structured regularization on the shared embedding space.
- The proposed Cross-Modal Forensic Fingerprint Consistency (CMFFC) loss aligns generator-induced artifacts across visual and audio streams, leveraging correlated traces created by coherent manipulations.
- Experiments on FakeAVCeleb report very high results (99.7% balanced accuracy, 99.8% AUC) along with strong attribution accuracy (95.9%), and cross-dataset tests show robust real-video detection, while fake detection on unseen generators remains challenging.
- The architecture combines a ResNet50 with temporal attention for video with a ResNet18-based audio (mel spectrogram) encoder to address capacity imbalances in prior multimodal detectors.
Related Articles

Building a Local AI Agent (Part 2): Six UX and UI Design Challenges
Dev.to

We Built a DNS-Based Discovery Protocol for AI Agents — Here's How It Works
Dev.to

Your first business opportunity in 3 commands: /register_directory in @biznode_bot, wait for matches, then /my_pulse to view...
Dev.to

Building AI Evaluation Pipelines: Automating LLM Testing from Dataset to CI/CD
Dev.to

Function Calling Harness 2: CoT Compliance from 9.91% to 100%
Dev.to