MonoSAOD: Monocular 3D Object Detection with Sparsely Annotated Label

arXiv cs.CV / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • この論文は、3Dアノテーションの高コストにより一部の物体しかラベル付けされない「疎ラベル」環境における単眼3D物体検出の課題を扱っています。
  • 提案手法の1つ目として、疎ラベルを活かすRoad-Aware Patch Augmentation(RAPA)を導入し、物体パッチを道路領域へ合成しつつ3D幾何の整合性を保つことを狙っています。
  • 2つ目として、Prototype-Based Filtering(PBF)により、プロトタイプ類似度と深度不確実性を用いて高品質な疑似ラベルを生成・選別します。
  • 学習では「幾何を壊さない拡張」と「プロトタイプ誘導の疑似ラベル」を組み合わせ、疎な3D教師信号下でもロバストに検出性能を高めることを、実験結果で示しています。
  • コードが公開されており、研究コミュニティが再現・検証できるよう配慮されています。

Abstract

Monocular 3D object detection has achieved impressive performance on densely annotated datasets. However, it struggles when only a fraction of objects are labeled due to the high cost of 3D annotation. This sparsely annotated setting is common in real-world scenarios where annotating every object is impractical. To address this, we propose a novel framework for sparsely annotated monocular 3D object detection with two key modules. First, we propose Road-Aware Patch Augmentation (RAPA), which leverages sparse annotations by augmenting segmented object patches onto road regions while preserving 3D geometric consistency. Second, we propose Prototype-Based Filtering (PBF), which generates high-quality pseudo-labels by filtering predictions through prototype similarity and depth uncertainty. It maintains global 2D RoI feature prototypes and selects pseudo-labels that are both feature-consistent with learned prototypes and have reliable depth estimates. Our training strategy combines geometry-preserving augmentation with prototype-guided pseudo-labeling to achieve robust detection under sparse supervision. Extensive experiments demonstrate the effectiveness of the proposed method. The source code is available at https://github.com/VisualAIKHU/MonoSAOD .