AI Navigate

IRIS: A Real-World Benchmark for Inverse Recovery and Identification of Physical Dynamic Systems from Monocular Video

arXiv cs.CV / 3/18/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • IRIS introduces a high-fidelity real-world benchmark for inverse recovery and identification of physical dynamic systems from monocular video, comprising 220 real-world 4K/60fps sequences with independently measured ground-truth parameters and uncertainty estimates.
  • It defines a standardized evaluation protocol that evaluates parameter accuracy, identifiability, extrapolation, robustness, and governing-equation selection.
  • The work evaluates multiple baselines, including a multi-step physics loss and four equation-identification strategies—VLM temporal reasoning, describe-then-classify prompting, CNN-based classification, and path-based labelling—across IRIS scenarios.
  • The dataset, annotations, evaluation toolkit, and all baseline implementations are publicly released to enable reproducible benchmarking.

Abstract

Unsupervised physical parameter estimation from video lacks a common benchmark: existing methods evaluate on non-overlapping synthetic data, the sole real-world dataset is restricted to single-body systems, and no established protocol addresses governing-equation identification. This work introduces IRIS, a high-fidelity benchmark comprising 220 real-world videos captured at 4K resolution and 60\,fps, spanning both single- and multi-body dynamics with independently measured ground-truth parameters and uncertainty estimates. Each dynamical system is recorded under controlled laboratory conditions and paired with its governing equations, enabling principled evaluation. A standardized evaluation protocol is defined encompassing parameter accuracy, identifiability, extrapolation, robustness, and governing-equation selection. Multiple baselines are evaluated, including a multi-step physics loss formulation and four complementary equation-identification strategies (VLM temporal reasoning, describe-then-classify prompting, CNN-based classification, and path-based labelling), establishing reference performance across all IRIS scenarios and exposing systematic failure modes that motivate future research. The dataset, annotations, evaluation toolkit, and all baseline implementations are publicly released.