Lookalike3D: Seeing Double in 3D

arXiv cs.CV / 3/27/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

反復して現れる（同一・類似の）物体を手がかりに、屋内シーンで「lookalike」な3D物体ペアを同一・類似・相違として分類する新タスクを提案しています。
Lookalike3Dはマルチビュー画像を入力とするTransformerで、大規模画像基盤モデルから得られる強い意味的事前知識を活用して、同一/近似ペアを効果的に見分けます。
学習・評価のために3DTwinsデータセット（ScanNet++由来、76kの手動注釈付きペア）を構築し、ベースラインに対してIoUが104%向上したと報告しています。
本手法は、繰り返し/lookalike物体を“強い手がかり”として使うことで、3D物体再構成やパーツの共同セグメンテーションといった下流タスクの性能向上にもつながることを示しています。

Abstract

3D object understanding and generation methods produce impressive results, yet they often overlook a pervasive source of information in real-world scenes: repeated objects. We introduce the task of lookalike object detection in indoor scenes, which leverages repeated and complementary cues from identical and near-identical object pairs. Given an input scene, the task is to classify pairs of objects as identical, similar or different using multiview images as input. To address this, we present Lookalike3D, a multiview image transformer that effectively distinguishes such object pairs by harnessing strong semantic priors from large image foundation models. To support this task, we collected the 3DTwins dataset, containing 76k manually annotated identical, similar and different pairs of objects based on ScanNet++, and show an improvement of 104% IoU over baselines. We demonstrate how our method improves downstream tasks such as enabling joint 3D object reconstruction and part co-segmentation, turning repeated and lookalike objects into a powerful cue for consistent, high-quality 3D perception. Our code, dataset and models will be made publicly available.