Exposing Cross-Modal Consistency for Fake News Detection in Short-Form Videos

arXiv cs.AI / 3/17/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

On two benchmarks, real short-form videos show high text-visual consistency and moderate text-audio consistency, while fake videos exhibit the reverse pattern.
The authors introduce MAGIC3, a detector that explicitly models cross-tri-modal (text, visuals, audio) consistency using both explicit pairwise and global signals derived from cross-modal attention.
MAGIC3 incorporates multi-style LLM rewrites to produce style-robust text representations and an uncertainty-aware classifier to enable selective routing through a visual-language model (VLM) pathway.
On FakeSV and FakeTT, MAGIC3 matches VLM-level accuracy while delivering 18-27× higher throughput and 93% VRAM savings, offering a strong cost-performance trade-off.

Abstract

Short-form video platforms are major channels for news but also fertile ground for multimodal misinformation where each modality appears plausible alone yet cross-modal relationships are subtly inconsistent, like mismatched visuals and captions. On two benchmark datasets, FakeSV (Chinese) and FakeTT (English), we observe a clear asymmetry: real videos exhibit high text-visual but moderate text-audio consistency, while fake videos show the opposite pattern. Moreover, a single global consistency score forms an interpretable axis along which fake probability and prediction errors vary smoothly. Motivated by these observations, we present MAGIC3 (Modal-Adversarial Gated Interaction and Consistency-Centric Classifier), a detector that explicitly models and exposes cross-tri-modal consistency signals at multiple granularities. MAGIC3 combines explicit pairwise and global consistency modeling with token- and frame-level consistency signals derived from cross-modal attention, incorporates multi-style LLM rewrites to obtain style-robust text representations, and employs an uncertainty-aware classifier for selective VLM routing. Using pre-extracted features, MAGIC3 consistently outperforms the strongest non-VLM baselines on FakeSV and FakeTT. While matching VLM-level accuracy, the two-stage system achieves 18-27x higher throughput and 93% VRAM savings, offering a strong cost-performance tradeoff.

We Scanned 11,529 MCP Servers for EU AI Act Compliance

Dev.to

Automating the Chase: AI for Festival Vendor Compliance

Dev.to

MCP Skills vs MCP Tools: The Right Way to Configure Your Server

Dev.to

500 AI Prompts Every Content Creator Needs in 2026 (20 Free Samples)

Dev.to

Building a Game for My Daughter with AI — Part 1: What If She Could Build It Too?

Dev.to

Exposing Cross-Modal Consistency for Fake News Detection in Short-Form Videos

Key Points

Abstract

Related Articles

We Scanned 11,529 MCP Servers for EU AI Act Compliance

Automating the Chase: AI for Festival Vendor Compliance

MCP Skills vs MCP Tools: The Right Way to Configure Your Server

500 AI Prompts Every Content Creator Needs in 2026 (20 Free Samples)

Building a Game for My Daughter with AI — Part 1: What If She Could Build It Too?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer