ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video

arXiv cs.CV / 4/10/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • ReconPhysは、単眼の単一動画から非剛体物体の見た目(appearance)と3D形状(3D Gaussian Splatting)だけでなく、物理的属性も同時に推定するフィードフォワード手法を提案しています。
  • 従来の微分レンダリングに基づく手法で必要だった高コストな調整や手作業のアノテーションを、自己教師ありの学習戦略で物理ラベルなしに置き換える点が特徴です。
  • 実験では合成の大規模データセット上で、将来予測のPSNRが21.64(最先端最適化基線13.27)に改善し、Chamfer Distanceも0.349から0.004へ大幅に低減したと報告されています。
  • 推論が1秒未満で完了し、既存手法で必要だった数時間規模の最適化を大きく短縮できるため、ロボティクスやグラフィックスでのシミュレーション用アセット生成を迅速化できるとされています。

Abstract

Reconstructing non-rigid objects with physical plausibility remains a significant challenge. Existing approaches leverage differentiable rendering for per-scene optimization, recovering geometry and dynamics but requiring expensive tuning or manual annotation, which limits practicality and generalizability. To address this, we propose ReconPhys, the first feedforward framework that jointly learns physical attribute estimation and 3D Gaussian Splatting reconstruction from a single monocular video. Our method employs a dual-branch architecture trained via a self-supervised strategy, eliminating the need for ground-truth physics labels. Given a video sequence, ReconPhys simultaneously infers geometry, appearance, and physical attributes. Experiments on a large-scale synthetic dataset demonstrate superior performance: our method achieves 21.64 PSNR in future prediction compared to 13.27 by state-of-the-art optimization baselines, while reducing Chamfer Distance from 0.349 to 0.004. Crucially, ReconPhys enables fast inference (<1 second) versus hours required by existing methods, facilitating rapid generation of simulation-ready assets for robotics and graphics.