UniCon3R: Contact-aware 3D Human-Scene Reconstruction from Monocular Video
arXiv cs.CV / 4/23/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- UniCon3R introduces a feed-forward framework for real-time 4D human-scene reconstruction from monocular video, producing world-coordinate human motion and scene geometry jointly.
- The work argues that prior methods’ physically implausible artifacts (e.g., floating bodies or penetrations) stem from not modeling human-environment physical interactions.
- UniCon3R predicts 3D human-scene contact from human pose and scene geometry, and uses contact not only as an auxiliary signal but as an active corrective cue during pose generation.
- Experiments on RICH, EMDB, 3DPW, and SLOPER4D show improved physical plausibility and better global human motion estimation versus state-of-the-art baselines, while maintaining real-time online inference.
- The authors claim contact functions as a powerful internal prior for physically grounded joint reconstruction, suggesting a new paradigm for the task.
Related Articles

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Trajectory Forecasts in Unknown Environments Conditioned on Grid-Based Plans
Dev.to

Why use an AI gateway at all?
Dev.to

OpenAI Just Named It Workspace Agents. We Open-Sourced Our Lark Version Six Months Ago
Dev.to

GPT Image 2 Subject-Lock Editing: A Practical Guide to input_fidelity
Dev.to