Visual-RRT: Finding Paths toward Visual-Goals via Differentiable Rendering

arXiv cs.RO / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper introduces Visual-RRT (vRRT), a motion-planning method that performs visual-goal planning when target configurations are given as images or videos instead of explicit joint angles.
  • vRRT combines sampling-based exploration from Rapidly-exploring Random Trees (RRTs) with gradient-based exploitation using differentiable robot rendering.
  • It proposes a frontier-based exploration-exploitation strategy that adaptively emphasizes visually promising regions during search.
  • It also presents inertial gradient tree expansion, which reuses optimization states across branches to keep gradient exploitation consistent (momentum-like behavior).
  • Experiments on multiple robot manipulators (including Franka, UR5e, and Fetch) show the approach works in both simulation and real-world settings, and the authors provide an open-source code repository.

Abstract

Rapidly-exploring random trees (RRTs) have been widely adopted for robot motion planning due to their robustness and theoretical guarantees. However, existing RRT-based planners require explicit goal configurations specified as numerical joint angles, while many practical applications provide goal specifications through visual observations such as images or demonstration videos where precise goal configurations are unavailable. In this paper, we propose visual-RRT (vRRT), a motion planner that enables visual-goal planning by unifying gradient-based exploitation from differentiable robot rendering with sampling-based exploration from RRTs. We further introduce (i) a frontier-based exploration-exploitation strategy that adaptively prioritizes visually promising search regions, and (ii) inertial gradient tree expansion that inherits optimization states across tree branches for momentum-consistent gradient exploitation. Extensive experiments across various robot manipulators including Franka, UR5e, and Fetch demonstrate that vRRT achieves effective visual-goal planning in both simulated and real-world settings, bridging the gap between sampling-based planning and vision-centric robot applications. Our code is available at https://sgvr.kaist.ac.kr/Visual-RRT.