3D Dynamics-Aware Manipulation: Endowing Manipulation Policies with 3D Foresight

arXiv cs.RO / 3/27/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

2Dベースの視覚ダイナミクスを用いる既存手法では、奥行き方向の大きな移動を伴う操作タスクで頑健性が不足するという課題を指摘しています。
3Dワールドモデリングとポリシー学習を統合した「3D dynamics-aware manipulation」フレームワークを提案し、3D先読み（3D foresight）を操作ポリシーに付与します。
フレームワーク内で自己教師ありの3つの学習タスク（現在の深度推定、将来RGB-D予測、3Dフロ―予測）を導入し、互いに補完し合う形で3D予測能力を学習します。
シミュレーションと実環境の広範な実験により、推論速度を落とさずに操作性能を大幅に向上できることを報告しています。

Abstract

The incorporation of world modeling into manipulation policy learning has pushed the boundary of manipulation performance. However, existing efforts simply model the 2D visual dynamics, which is insufficient for robust manipulation when target tasks involve prominent depth-wise movement. To address this, we present a 3D dynamics-aware manipulation framework that seamlessly integrates 3D world modeling and policy learning. Three self-supervised learning tasks (current depth estimation, future RGB-D prediction, 3D flow prediction) are introduced within our framework, which complement each other and endow the policy model with 3D foresight. Extensive experiments on simulation and the real world show that 3D foresight can greatly boost the performance of manipulation policies without sacrificing inference speed. Code is available at https://github.com/Stardust-hyx/3D-Foresight.