Integrating Object Detection, LiDAR-Enhanced Depth Estimation, and Segmentation Models for Railway Environments

arXiv cs.CV / 4/17/2026

📰 NewsModels & Research

Key Points

  • The paper tackles railway obstacle safety by combining object detection with distance estimation, an area that many prior studies only partially address.
  • It introduces a modular, flexible framework that jointly performs rail track identification, obstacle detection, and obstacle distance estimation by integrating three neural networks.
  • The approach uses monocular depth estimation enhanced with LiDAR point clouds, enabling more accurate spatial perception than detection-only or distance-less pipelines.
  • For reliable quantitative evaluation despite limited real-world ground truth, the authors assess the system on a synthetic dataset (SynDRA) that includes accurate ground-truth annotations.
  • The system reports a mean absolute error (MAE) as low as 0.63 meters when fusing monocular depth maps with LiDAR, enabling direct comparison against existing methods.

Abstract

Obstacle detection in railway environments is crucial for ensuring safety. However, very few studies address the problem using a complete, modular, and flexible system that can both detect objects in the scene and estimate their distance from the vehicle. Most works focus solely on detection, others attempt to identify the track, and only a few estimate obstacle distances. Additionally, evaluating these systems is challenging due to the lack of ground truth data. In this paper, we propose a modular and flexible framework that identifies the rail track, detects potential obstacles, and estimates their distance by integrating three neural networks for object detection, track segmentation, and monocular depth estimation with LiDAR point clouds. To enable a reliable and quantitative evaluation, the proposed framework is assessed using a synthetic dataset (SynDRA), which provides accurate ground truth annotations, allowing for direct performance comparison with existing methods. The proposed system achieves a mean absolute error (MAE) as low as 0.63 meters by integrating monocular depth maps with LiDAR, enabling not only accurate distance estimates but also spatial perception of the scene.