Exposing Cross-Modal Consistency for Fake News Detection in Short-Form Videos
arXiv cs.AI / 3/17/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- On two benchmarks, real short-form videos show high text-visual consistency and moderate text-audio consistency, while fake videos exhibit the reverse pattern.
- The authors introduce MAGIC3, a detector that explicitly models cross-tri-modal (text, visuals, audio) consistency using both explicit pairwise and global signals derived from cross-modal attention.
- MAGIC3 incorporates multi-style LLM rewrites to produce style-robust text representations and an uncertainty-aware classifier to enable selective routing through a visual-language model (VLM) pathway.
- On FakeSV and FakeTT, MAGIC3 matches VLM-level accuracy while delivering 18-27× higher throughput and 93% VRAM savings, offering a strong cost-performance trade-off.
Related Articles
We Scanned 11,529 MCP Servers for EU AI Act Compliance
Dev.to
Automating the Chase: AI for Festival Vendor Compliance
Dev.to
MCP Skills vs MCP Tools: The Right Way to Configure Your Server
Dev.to
500 AI Prompts Every Content Creator Needs in 2026 (20 Free Samples)
Dev.to
Building a Game for My Daughter with AI — Part 1: What If She Could Build It Too?
Dev.to