MultiCam: On-the-fly Multi-Camera Pose Estimation Using Spatiotemporal Overlaps of Known Objects

arXiv cs.CV / 3/25/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes MultiCam, an on-the-fly multi-camera pose estimation method for dynamic multi-camera AR that leverages known objects in the scene rather than relying on continuously visible markers.
  • It achieves constant pose updates by enhancing an existing object pose estimator to maintain a spatiotemporal scene graph, enabling relationships between cameras even when their fields of view do not overlap.
  • The approach explicitly targets the marker-based tracking limitation that markers must remain within each camera’s field of view.
  • The authors introduce a new multi-camera, multi-object dataset with temporal field-of-view overlap (supporting both static and dynamic camera setups) to evaluate the method.
  • Experiments show improved camera pose accuracy over state-of-the-art methods on standard benchmarks (YCB-V and T-LESS) in overlapping scenarios, supporting the effectiveness of a marker-less AR pipeline.

Abstract

Multi-camera dynamic Augmented Reality (AR) applications require a camera pose estimation to leverage individual information from each camera in one common system. This can be achieved by combining contextual information, such as markers or objects, across multiple views. While commonly cameras are calibrated in an initial step or updated through the constant use of markers, another option is to leverage information already present in the scene, like known objects. Another downside of marker-based tracking is that markers have to be tracked inside the field-of-view (FoV) of the cameras. To overcome these limitations, we propose a constant dynamic camera pose estimation leveraging spatiotemporal FoV overlaps of known objects on the fly. To achieve that, we enhance the state-of-the-art object pose estimator to update our spatiotemporal scene graph, enabling a relation even among non-overlapping FoV cameras. To evaluate our approach, we introduce a multi-camera, multi-object pose estimation dataset with temporal FoV overlap, including static and dynamic cameras. Furthermore, in FoV overlapping scenarios, we outperform the state-of-the-art on the widely used YCB-V and T-LESS dataset in camera pose accuracy. Our performance on both previous and our proposed datasets validates the effectiveness of our marker-less approach for AR applications. The code and dataset are available on https://github.com/roth-hex-lab/IEEE-VR-2026-MultiCam.