UniPR: Unified Object-level Real-to-Sim Perception and Reconstruction from a Single Stereo Pair

arXiv cs.CV / 3/23/2026

📰 NewsModels & Research

共有:

Key Points

UniPR is an end-to-end object-level real-to-sim perception and reconstruction framework that operates on a single stereo image pair.
It eliminates multi-stage modular pipelines by leveraging geometric constraints to resolve scale ambiguity and perform all reconstruction in a single forward pass.
It introduces Pose-Aware Shape Representation to bridge reconstruction and pose estimation without per-category canonical shapes.
It introduces LVS6D, a large-vocabulary stereo dataset with over 6,300 objects to support large-scale research and evaluation.
Experiments show UniPR reconstructs all objects in a scene in parallel, preserving true physical proportions and offering significant efficiency gains for real-world robotics.

Abstract

Perceiving and reconstructing objects from images are critical for real-to-sim transfer tasks, which are widely used in the robotics community. Existing methods rely on multiple submodules such as detection, segmentation, shape reconstruction, and pose estimation to complete the pipeline. However, such modular pipelines suffer from inefficiency and cumulative error, as each stage operates on only partial or locally refined information while discarding global context. To address these limitations, we propose UniPR, the first end-to-end object-level real-to-sim perception and reconstruction framework. Operating directly on a single stereo image pair, UniPR leverages geometric constraints to resolve the scale ambiguity. We introduce Pose-Aware Shape Representation to eliminate the need for per-category canonical definitions and to bridge the gap between reconstruction and pose estimation tasks. Furthermore, we construct a large-vocabulary stereo dataset, LVS6D, comprising over 6,300 objects, to facilitate large-scale research in this area. Extensive experiments demonstrate that UniPR reconstructs all objects in a scene in parallel within a single forward pass, achieving significant efficiency gains and preserves true physical proportions across diverse object types, highlighting its potential for practical robotic applications.

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

Dev.to

The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

Dev.to

[P] Prompt optimization for analog circuit placement — 97% of expert quality, zero training data

Reddit r/MachineLearning

[R] Looking for arXiv endorser (cs.AI or cs.LG)

Reddit r/MachineLearning

I curated an 'Awesome List' for Generative AI in Jewelry- papers, datasets, open-source models and tools included!

Reddit r/artificial

UniPR: Unified Object-level Real-to-Sim Perception and Reconstruction from a Single Stereo Pair

Key Points

Abstract

Related Articles

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

[P] Prompt optimization for analog circuit placement — 97% of expert quality, zero training data

[R] Looking for arXiv endorser (cs.AI or cs.LG)

I curated an 'Awesome List' for Generative AI in Jewelry- papers, datasets, open-source models and tools included!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer