LOD-Net: Locality-Aware 3D Object Detection Using Multi-Scale Transformer Network

arXiv cs.CV / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces LOD-Net, a locality-aware 3D object detection approach that addresses point-cloud sparsity and missing global structure.
  • It proposes a Multi-Scale Attention (MSA) mechanism integrated into the 3DETR architecture, including an upsampling operation to produce higher-resolution feature maps.
  • The method aims to improve detection of smaller objects and objects that are semantically related by better capturing local geometry alongside global context.
  • Experiments on ScanNetv2 show improved performance versus the baseline, with nearly +1% mAP@25 and +4.78% mAP@50.
  • Applying MSA to the lighter 3DETR-m variant yields limited gains, and the authors conclude that upsampling strategies must be adapted for lightweight models.

Abstract

3D object detection in point cloud data remains a challenging task due to the sparsity and lack of global structure inherent in the input. In this work, we propose a novel Multi-Scale Attention (MSA) mechanism integrated into the 3DETR architecture to better capture both local geometry and global context. Our method introduces an upsampling operation that generates high-resolution feature maps, enabling the network to better detect smaller and semantically related objects. Experiments conducted on the ScanNetv2 dataset demonstrate that our 3DETR + MSA model improves detection performance, achieving a gain of almost 1% in mAP@25 and 4.78% in mAP@50 over the baseline. While applying MSA to the 3DETR-m variant shows limited improvement, our analysis reveals the importance of adapting the upsampling strategy for lightweight models. These results highlight the effectiveness of combining hierarchical feature extraction with attention mechanisms in enhancing 3D scene understanding.