RayMamba: Ray-Aligned Serialization for Long-Range 3D Object Detection

arXiv cs.CV / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • RayMamba is proposed to improve long-range 3D object detection from sparse, fragmented far-field LiDAR by using a geometry-aware, plug-and-play enhancement for voxel-based detectors.
  • The method replaces generic serialization with a ray-aligned, sector-wise ordered sequence that preserves directional continuity and occlusion-related contextual neighborhoods for subsequent Mamba/SSM modeling.
  • RayMamba is reported to be compatible with both LiDAR-only and multimodal 3D detectors and adds only modest computational overhead.
  • Experiments on nuScenes and Argoverse 2 show consistent gains, including up to +2.49 mAP and +1.59 NDS in the 40–50 m range on nuScenes and improved VoxelNeXt results on Argoverse 2 (30.3→31.2 mAP).

Abstract

Long-range 3D object detection remains challenging because LiDAR observations become highly sparse and fragmented in the far field, making reliable context modeling difficult for existing detectors. To address this issue, recent state space model (SSM)-based methods have improved long-range modeling efficiency. However, their effectiveness is still limited by generic serialization strategies that fail to preserve meaningful contextual neighborhoods in sparse scenes. To address this issue, we propose RayMamba, a geometry-aware plug-and-play enhancement for voxel-based 3D detectors. RayMamba organizes sparse voxels into sector-wise ordered sequences through a ray-aligned serialization strategy, which preserves directional continuity and occlusion-related context for subsequent Mamba-based modeling. It is compatible with both LiDAR-only and multimodal detectors, while introducing only modest overhead. Extensive experiments on nuScenes and Argoverse 2 demonstrate consistent improvements across strong baselines. In particular, RayMamba achieves up to 2.49 mAP and 1.59 NDS gain in the challenging 40--50 m range on nuScenes, and further improves VoxelNeXt on Argoverse 2 from 30.3 to 31.2 mAP.