MV3DIS: Multi-View Mask Matching via 3D Guides for Zero-Shot 3D Instance Segmentation

arXiv cs.CV / 4/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • 本論文は、ゼロショット3Dインスタンスセグメンテーションにおいて、ビュー間の相関や3Dプリオルを十分に活かせていない従来手法の限界を指摘しています。
  • MV3DISは、粗い3Dセグメントを共通参照として2Dマスクをマルチビューでマッチさせ、3Dカバレッジ分布で一貫性を強化する「3Dガイド付きマスクマッチング」を提案します。
  • さらに、深度整合性に基づく重み付けで投影の信頼性を定量化し、物体間の遮蔽による対応の曖昧さを抑えてロバスト性を高めます。
  • ScanNetV2/200、ScanNet++、Replica、Matterport3Dの広範な実験で、従来手法より高い性能を達成したと報告しています。

Abstract

Conventional 3D instance segmentation methods rely on labor-intensive 3D annotations for supervised training, which limits their scalability and generalization to novel objects. Recent approaches leverage multi-view 2D masks from the Segment Anything Model (SAM) to guide the merging of 3D geometric primitives, thereby enabling zero-shot 3D instance segmentation. However, these methods typically process each frame independently and rely solely on 2D metrics, such as SAM prediction scores, to produce segmentation maps. This design overlooks multi-view correlations and inherent 3D priors, leading to inconsistent 2D masks across views and ultimately fragmented 3D segmentation. In this paper, we propose MV3DIS, a coarse-to-fine framework for zero-shot 3D instance segmentation that explicitly incorporates 3D priors. Specifically, we introduce a 3D-guided mask matching strategy that uses coarse 3D segments as a common reference to match 2D masks across views and consolidates multi-view mask consistency via 3D coverage distributions. Guided by these view-consistent 2D masks, the coarse 3D segments are further refined into precise 3D instances. Additionally, we introduce a depth consistency weighting scheme that quantifies projection reliability to suppress ambiguities from inter-object occlusions, thereby improving the robustness of 3D-to-2D correspondence. Extensive experiments on the ScanNetV2, ScanNet200, ScanNet++, Replica, and Matterport3D datasets demonstrate the effectiveness of MV3DIS, which achieves superior performance over previous methods