Point2Pose: Occlusion-Recovering 6D Pose Tracking and 3D Reconstruction for Multiple Unknown Objects Via 2D Point Trackers

arXiv cs.RO / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Point2Pose is a model-free approach for causal 6D pose tracking and 3D reconstruction of multiple rigid objects from monocular RGB-D video, initialized using only sparse points on the objects.
  • It tracks unseen objects without CAD models or category priors by using a 2D point tracker to generate long-range correspondences and recover instantly after full occlusion.
  • The method incrementally builds an online TSDF representation for the tracked targets, enabling simultaneous pose estimation and surface/geometry reconstruction.
  • The authors also introduce a new multi-object tracking dataset (simulation + real-world) with motion-capture ground truth for evaluation.
  • Experiments indicate performance comparable to state-of-the-art methods on a severe-occlusion benchmark, with added multi-object handling and complete-occlusion recovery beyond earlier model-free tracking approaches.

Abstract

We present Point2Pose, a model-free method for causal 6D pose tracking of multiple rigid objects from monocular RGB-D video. Initialized only from sparse image points on the objects to be tracked, our approach tracks multiple unseen objects without requiring object CAD models or category priors. Point2Pose leverages a 2D point tracker to obtain long-range correspondences, enabling instant recovery after complete occlusion. Simultaneously, the system incrementally reconstructs an online Truncated Signed Distance Function (TSDF) representation of the tracked targets. Alongside the method, we introduce a new multi-object tracking dataset comprising both simulation and real-world sequences, with motion-capture ground truth for evaluation. Experiments show that Point2Pose achieves performance comparable to the state-of-the-art methods on a severe-occlusion benchmark, while additionally supporting multi-object tracking and recovery from complete occlusion, capabilities that are not supported by previous model-free tracking approaches.