From Priors to Perception: Grounding Video-LLMs in Physical Reality
arXiv cs.CV / 5/7/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- Video-LLMs can show systematic weaknesses in fine-grained physical reasoning, including failures when visuals contradict statistical expectations.
- The paper argues these errors come from “Semantic Prior Dominance,” where internal narrative priors hijack reasoning rather than from a basic lack of perception.
- It introduces the Programmatic Adversarial Curriculum (PACC), a high-fidelity adversarial video dataset generated from physical laws to separate visual artifacts from true logical failures.
- It also proposes Visual-Anchored Reasoning Chain (VARC), which requires models to ground judgments in low-level visual facts before performing logical reasoning.
- Experiments indicate that standard LoRA fine-tuning using PACC (without architectural changes) substantially improves state-of-the-art models’ physical reasoning performance.
Related Articles

MCP Sentinel v1.0 Is Out: A Lockfile for MCP Tool Schemas
Dev.to

Preserving Color in Neural Artistic Style Transfer
Dev.to

I Built an AI Video Factory That Runs 24/7 — Fully Open Source
Dev.to

Your Agency Doesn’t Have a Productivity Problem It Has a Workflow Problem
Dev.to

How to Use AI Agents in Capacitor App Development
Dev.to