Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-language Models

arXiv cs.RO / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper presents a fully autonomous hierarchical framework for long-horizon routing of deformable linear objects (e.g., cables and ropes), which require long-term planning and reliable multi-skill execution.
  • It converts language-specified routing goals into high-level plans using vision-language models for in-context reasoning, then relies on reinforcement learning to execute low-level manipulation skills.
  • To handle robustness over long horizons, the method includes a failure recovery mechanism that reorients the DLO into insertion-feasible states when errors occur.
  • The approach is reported to generalize across diverse scenes and command styles (including implicit language and spatial descriptions) and achieves a 92% overall success rate on long-horizon routing scenarios.
  • The work is accompanied by a project page and described as an arXiv update, positioning it as an applied research contribution for robot manipulation of deformable objects.

Abstract

Long-horizon routing tasks of deformable linear objects (DLOs), such as cables and ropes, are common in industrial assembly lines and everyday life. These tasks are particularly challenging because they require robots to manipulate DLO with long-horizon planning and reliable skill execution. Successfully completing such tasks demands adapting to their nonlinear dynamics, decomposing abstract routing goals, and generating multi-step plans composed of multiple skills, all of which require accurate high-level reasoning during execution. In this paper, we propose a fully autonomous hierarchical framework for solving challenging DLO routing tasks. Given an implicit or explicit routing goal expressed in language, our framework leverages vision-language models~(VLMs) for in-context high-level reasoning to synthesize feasible plans, which are then executed by low-level skills trained via reinforcement learning. To improve robustness over long horizons, we further introduce a failure recovery mechanism that reorients the DLO into insertion-feasible states. Our approach generalizes to diverse scenes involving object attributes, spatial descriptions, implicit language commands, and \myred{extended 5-clip settings}. It achieves an overall success rate of 92\% across long-horizon routing scenarios. Please refer to our project page: https://icra2026-dloroute.github.io/DLORoute/