Depth-Aware Rover: A Study of Edge AI and Monocular Vision for Real-World Implementation

arXiv cs.CV / 4/27/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The study examines depth-aware rover navigation, comparing simulated and real-world deployments that shift from stereo vision to monocular depth estimation using edge AI.
  • A Unity-based lunar terrain simulator was used with stereo cameras and OpenCV’s StereoSGBM to generate disparity maps for the stereo-vision baseline.
  • On a Raspberry Pi 4 physical rover, monocular metric depth estimation was implemented with UniDepthV2 alongside real-time object detection using YOLO12n.
  • Although stereo vision achieved higher accuracy in simulation, the monocular edge-AI approach was more robust and more cost-effective for real-world deployment, running at about 0.1 FPS for depth estimation and 10 FPS for detection.

Abstract

This study analyses simulated and real-world implementations of depth-aware rover navigation, highlighting the transition from stereo vision to monocular depth estimation using edge AI. A Unity-based lunar terrain simulator with stereo cameras and OpenCV's StereoSGBM was used to generate disparity maps. A physical rover built on Raspberry Pi 4 employed UniDepthV2 for monocular metric depth estimation and YOLO12n for real-time object detection. While stereo vision yielded higher accuracy in simulation, the monocular approach proved more robust and cost-effective in real-world deployment, achieving 0.1 FPS for depth and 10 FPS for detection.