Closed-Loop Verbal Reinforcement Learning for Task-Level Robotic Planning

arXiv cs.RO / 2026/3/24

💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research

要点

  • The paper introduces a closed-loop Verbal Reinforcement Learning (VRL) framework for interpretable task-level robotic planning under execution uncertainty.
  • It refines executable Behavior Trees by using an LLM actor guided by structured natural-language feedback from a Vision-Language Model critic that analyzes the robot’s observations and execution traces.
  • Unlike conventional gradient-based reinforcement learning, VRL updates policies directly at the symbolic planning level without gradient optimization, aiming for transparency and explicit causal feedback.
  • The framework is validated on a real mobile robot completing a multi-stage manipulation-and-navigation task, showing explainable policy improvements and adaptation to execution failures.

Abstract

We propose a new Verbal Reinforcement Learning (VRL) framework for interpretable task-level planning in mobile robotic systems operating under execution uncertainty. The framework follows a closed-loop architecture that enables iterative policy improvement through interaction with the physical environment. In our framework, executable Behavior Trees are repeatedly refined by a Large Language Model actor using structured natural-language feedback produced by a Vision-Language Model critic that observes the physical robot and execution traces. Unlike conventional reinforcement learning, policy updates in VRL occur directly at the symbolic planning level, without gradient-based optimization. This enables transparent reasoning, explicit causal feedback, and human-interpretable policy evolution. We validate the proposed framework on a real mobile robot performing a multi-stage manipulation and navigation task under execution uncertainty. Experimental results show that the framework supports explainable policy improvements, closed-loop adaptation to execution failures, and reliable deployment on physical robotic systems.