Railway Artificial Intelligence Learning Benchmark (RAIL-BENCH): A Benchmark Suite for Perception in the Railway Domain

arXiv cs.CV / 4/27/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper introduces RAIL-BENCH, the first public benchmark suite specifically designed to evaluate camera-based perception for automated train operation on existing railway infrastructure.
  • The benchmark includes five railway perception challenges: rail track detection, object detection, vegetation segmentation, multi-object tracking, and monocular visual odometry, each adapted to railway-environment characteristics.
  • It provides curated training/test datasets from diverse real-world scenarios along with standardized evaluation metrics and public scoreboards to enable reproducible comparisons across approaches.
  • For rail track detection, the authors propose LineAP, a new segment-based average precision metric that focuses on geometric accuracy of predicted polylines while avoiding weaknesses in prior line-detection metrics.
  • A public resource is hosted at https://www.mrt.kit.edu/railbench, including the benchmark components and the scoring platform.

Abstract

Automated train operation on existing railway infrastructure requires robust camera-based perception, yet the railway domain lacks public benchmark suites with standardized evaluation protocols that would enable reproducible comparison of approaches. We present RAIL-BENCH, the first perception benchmark suite for the railway domain. It comprises five challenges - rail track detection, object detection, vegetation segmentation, multi-object tracking, and monocular visual odometry - each tailored to the specific characteristics of railway environments. RAIL-BENCH provides curated training and test datasets drawn from diverse real-world scenarios, evaluation metrics, and public scoreboards (https://www.mrt.kit.edu/railbench). For the rail track detection challenge we introduce LineAP, a novel segment-based average precision metric that evaluates the geometric accuracy of polyline predictions independently of instance-level grouping, addressing key limitations of existing line detection metrics.