JRM: Joint Reconstruction Model for Multiple Objects without Alignment

arXiv cs.CV / 3/30/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces the Joint Reconstruction Model (JRM) for object-centric 3D reconstruction that improves consistency by leveraging repetition of the same object across views or scans.
  • Unlike prior methods that require explicit matching and rigid alignment (making them fragile and hard to extend), JRM uses a 3D flow-matching generative model to aggregate unaligned observations implicitly in latent space.
  • JRM is designed to enforce shared object “subject” consistency while still respecting each observation’s specific pose and state, enabling faithful reconstructions.
  • Experiments on synthetic and real-world datasets indicate that removing explicit alignment improves robustness to incorrect associations and supports non-rigid changes like articulation.
  • The authors report that JRM outperforms both independent reconstruction baselines and alignment-based approaches in overall reconstruction quality.

Abstract

Object-centric reconstruction seeks to recover the 3D structure of a scene through composition of independent objects. While this independence can simplify modeling, it discards strong signals that could improve reconstruction, notably repetition where the same object model is seen multiple times in a scene, or across scans. We propose the Joint Reconstruction Model (JRM) to leverage repetition by framing object reconstruction as one of personalized generation: multiple observations share a common subject that should be consistent for all observations, while still adhering to the specific pose and state from each. Prior methods in this direction rely on explicit matching and rigid alignment across observations, making them sensitive to errors and difficult to extend to non-rigid transformations. In contrast, JRM is a 3D flow-matching generative model that implicitly aggregates unaligned observations in its latent space, learning to produce consistent and faithful reconstructions in a data-driven manner without explicit constraints. Evaluations on synthetic and real-world data show that JRM's implicit aggregation removes the need for explicit alignment, improves robustness to incorrect associations, and naturally handles non-rigid changes such as articulation. Overall, JRM outperforms both independent and alignment-based baselines in reconstruction quality.