Deep Learning Aided Vision System for Planetary Rovers

arXiv cs.CV / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a planetary rover vision pipeline that couples real-time perception with offline terrain reconstruction to support autonomous exploration.
  • In real time, it enhances stereo imagery with CLAHE, performs object detection using YOLOv11n, and estimates metric object distances via a dedicated neural network.
  • For offline reconstruction, it uses Depth Anything V2 to estimate monocular depth from captured images, then fuses depth maps into dense point clouds with Open3D.
  • Experiments on Chandrayaan 3 NavCam stereo imagery report a median depth error of 2.26 cm (within a 1–10 meter range) against a CAHV-based utility, while the detector shows a balanced precision–recall tradeoff on grayscale lunar scenes.
  • The authors position the overall architecture as scalable and compute-efficient for on-planetary deployment and broader rover missions.

Abstract

This study presents a vision system for planetary rovers, combining real-time perception with offline terrain reconstruction. The real-time module integrates CLAHE enhanced stereo imagery, YOLOv11n based object detection, and a neural network to estimate object distances. The offline module uses the Depth Anything V2 metric monocular depth estimation model to generate depth maps from captured images, which are fused into dense point clouds using Open3D. Real world distance estimates from the real time pipeline provide reliable metric context alongside the qualitative reconstructions. Evaluation on Chandrayaan 3 NavCam stereo imagery, benchmarked against a CAHV based utility, shows that the neural network achieves a median depth error of 2.26 cm within a 1 to 10 meter range. The object detection model maintains a balanced precision recall tradeoff on grayscale lunar scenes. This architecture offers a scalable, compute-efficient vision solution for autonomous planetary exploration.