Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis
arXiv cs.CV / 4/15/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that most deepfake detection research targets speaking manipulations, but real interactive attacks may alternate between speaking and listening to increase realism and persuasiveness.
- It introduces a new task, Listening Deepfake Detection (LDD), and presents ListenForge, the first dataset tailored to listening forgeries, built using five Listening Head Generation methods.
- To detect listening-specific artifacts, the authors propose MANet, a Motion-aware and Audio-guided Network that models subtle motion inconsistencies in listener video while using speaker audio semantics for cross-modal fusion.
- Experimental results show that existing speaking-centric deepfake detectors generalize poorly to listening scenarios, while MANet performs significantly better on ListenForge.
- The dataset and code are released to support further multimodal forgery analysis in interactive communication settings.
Related Articles

Black Hat Asia
AI Business

The Complete Guide to Better Meeting Productivity with AI Note-Taking
Dev.to

5 Ways Real-Time AI Can Boost Your Sales Call Performance
Dev.to

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG
Dev.to
Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]
Reddit r/MachineLearning