RoboClaw: An Agentic Framework for Scalable Long-Horizon Robotic Tasks

arXiv cs.RO / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

RoboClawは、VLM（Vision-Language-Action）ベースのロボティクスにおける長期（long-horizon）タスクのスケーリング課題に対し、データ収集・政策学習・実行を単一のVLM駆動コントローラで統合するエージェント型フレームワークを提案している。
政策レベルではEntangled Action Pairs（EAP）により、前進する操作行動と逆方向の回復行動を結合した自己リセットループを構成し、継続的なオンポリシーのデータ獲得と反復的な政策改善を、人の介入を最小化しながら実現する。
デプロイ時は同一エージェントが高レベルの推論を行い、学習済みのポリシープリミティブを動的にオーケストレーションして長期タスクを遂行する設計になっている。
収集と実行で文脈セマンティクスを一貫させることで、従来のフェーズ間不一致や複数ポリシーの脆さを低減し、実環境の操作タスクで成功率と人手削減の双方で改善を示している（成功率25%向上、人の時間投資53.7%削減）。

Abstract

Vision-Language-Action (VLA) systems have shown strong potential for language-driven robotic manipulation. However, scaling them to long-horizon tasks remains challenging. Existing pipelines typically separate data collection, policy learning, and deployment, resulting in heavy reliance on manual environment resets and brittle multi-policy execution. We present RoboClaw, an agentic robotics framework that unifies data collection, policy learning, and task execution under a single VLM-driven controller. At the policy level, RoboClaw introduces Entangled Action Pairs (EAP), which couple forward manipulation behaviors with inverse recovery actions to form self-resetting loops for autonomous data collection. This mechanism enables continuous on-policy data acquisition and iterative policy refinement with minimal human intervention. During deployment, the same agent performs high-level reasoning and dynamically orchestrates learned policy primitives to accomplish long-horizon tasks. By maintaining consistent contextual semantics across collection and execution, RoboClaw reduces mismatch between the two phases and improves multi-policy robustness. Experiments in real-world manipulation tasks demonstrate improved stability and scalability compared to conventional open-loop pipelines, while significantly reducing human effort throughout the robot lifecycle, achieving a 25% improvement in success rate over baseline methods on long-horizon tasks and reducing human time investment by 53.7%.