FCMBench-Video: Benchmarking Document Video Intelligence
arXiv cs.CV / 4/29/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- The paper introduces FCMBench-Video, a new benchmark focused on document video intelligence for financial use cases where accuracy and evidence traceability are critical (e.g., credit review and remote verification).
- Unlike static images, document videos add temporal, sequential evidence that must be integrated across frames while retaining authenticity-relevant acquisition cues.
- The benchmark is constructed for privacy-compliant but realistic scaling by recording reusable atomic single-document clips, applying controlled degradations, and composing long-form multi-document videos with specified temporal spans.
- FCMBench-Video includes 495 atomic videos that are composed into 1,200 long-form videos, with 11,322 expert-annotated QA instances across 28 document types and both Chinese and English questions.
- Tests on nine recent Video-MLLMs suggest the benchmark meaningfully differentiates systems and capabilities, identifying which tasks are most duration-sensitive and which probe higher-level evidence integration and robustness (e.g., visual prompt injection).
Related Articles
LLMs will be a commodity
Reddit r/artificial

HubSpot Just Legitimized AEO: What It Means for Your Brand AI Visibility
Dev.to

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally
Reddit r/LocalLLaMA

From Fault Codes to Smart Fixes: How Google Cloud NEXT ’26 Inspired My AI Mechanic Assistant
Dev.to

Dex lands $5.3M to grow its AI-driven talent matching platform
Tech.eu