DualFact+: A Multimodal Fact Verification Framework for Procedural Video Understanding
arXiv cs.AI / 4/29/2026
📰 NewsModels & Research
Key Points
- The paper introduces DualFact, a dual-layer multimodal evaluation framework that distinguishes conceptual facts from context-grounded facts in procedural video captioning.
- DualFact uses implicit argument augmentation (VIA) and contrastive fact sets to perform more complete and role-consistent factual verification.
- It provides two verification modes: DualFact-T checks against textual evidence, while DualFact-V checks against video-grounded visual evidence.
- Experiments on YouCook3-Fact and CraftBench-Fact find that state-of-the-art multimodal LLMs often generate fluent but factually incomplete captions with systematic omissions and role inconsistencies.
- DualFact aligns better with human factuality judgments than standard metrics, especially for contextual facts, and shows that caption-only evaluation can underestimate or mischaracterize hallucinations versus video-grounded verification.
Related Articles
v0.22.1
Ollama Releases

The best of Cloud Next '26: Gemini Enterprise Agent Platform. The perfect combination of Intelligence and Automation to generate VALUE.
Dev.to

Open source memory layer so any AI agent can do what Claude.ai and ChatGPT do
Dev.to

Sources: Anthropic could raise a new $50B round at a valuation of $900B
TechCrunch

Satya Nadella says he’s ready to ‘exploit’ the new OpenAI deal
TechCrunch