GenVideoLens: Where LVLMs Fall Short in AI-Generated Video Detection?
arXiv cs.CV / 3/20/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- GenVideoLens is a fine-grained benchmark for evaluating LVLMs in AI-generated video detection, enabling dimension-wise assessment rather than binary classification.
- The benchmark contains 400 highly deceptive AI-generated videos and 100 real videos, annotated by experts across 15 authenticity dimensions spanning perceptual, optical, physical, and temporal cues.
- Eleven representative LVLMs are evaluated, revealing that models perform relatively well on perceptual cues but struggle with optical consistency, physical interactions, and temporal-causal reasoning.
- Performance varies across models, with smaller open-source models sometimes outperforming stronger proprietary models on specific cues.
- Temporal perturbation experiments indicate LVLMs underutilize temporal information, providing diagnostic guidance for future improvement of AI-generated video detectors.
Related Articles
AgentDesk vs Hiring Another Consultant: A Cost Comparison
Dev.to
"Why Your AI Agent Needs a System 1"
Dev.to
When should we expect TurboQuant?
Reddit r/LocalLLaMA
AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia
Dev.to
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Dev.to