Scalable Trajectory Generation for Whole-Body Mobile Manipulation

arXiv cs.RO / 4/15/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • 複数の移動ベースとアームを同時に制御する「全身モバイルマニピュレーション」では、状態空間がシーンや物体多様性により組合せ的に爆発し、大規模で物理的に妥当な軌道データが必要になるが、従来は取得が労働集約的または計算的に困難だった。
  • AutoMoMaはGPU加速したフレームワークで、ベース・アーム・物体の運動学を単一チェーンとして統合するAKRモデリングと、並列化された軌道最適化を組み合わせ、大規模データ生成のボトルネックを解消する。
  • AutoMoMaはGPUあたり1時間で5,000エピソード、計50万超の物理的に有効な軌道を330シーン・多様な関節物体・複数ロボット機体にわたって生成し、CPU基準より大幅に高速(約80倍超)である。
  • さらに、生成データで学習した模倣学習(IL)では、単一の関節物体タスクでもSOTA級手法が約80%成功に到達するには数万デモが必要で、データ不足がアルゴリズム上の限界より支配的だったことを示した。

Abstract

Robots deployed in unstructured environments must coordinate whole-body motion -- simultaneously moving a mobile base and arm -- to interact with the physical world. This coupled mobility and dexterity yields a state space that grows combinatorially with scene and object diversity, demanding datasets far larger than those sufficient for fixed-base manipulation. Yet existing acquisition methods, including teleoperation and planning, are either labor-intensive or computationally prohibitive at scale. The core bottleneck is the lack of a scalable pipeline for generating large-scale, physically valid, coordinated trajectory data across diverse embodiments and environments. Here we introduce AutoMoMa, a GPU-accelerated framework that unifies AKR modeling, which consolidates base, arm, and object kinematics into a single chain, with parallelized trajectory optimization. AutoMoMa achieves 5,000 episodes per GPU-hour (over 80\times faster than CPU-based baselines), producing a dataset of over 500k physically valid trajectories spanning 330 scenes, diverse articulated objects, and multiple robot embodiments. Prior datasets were forced to compromise on scale, diversity, or kinematic fidelity; AutoMoMa addresses all three simultaneously. Training downstream IL policies further reveals that even a single articulated-object task requires tens of thousands of demonstrations for SOTA methods to reach \approx 80\% success, confirming that data scarcity -- not algorithmic limitations -- has been the binding constraint. AutoMoMa thus bridges high-performance planning and reliable IL-based control, providing the infrastructure previously missing for coordinated mobile manipulation research. By making large-scale, kinematically valid training data practical, AutoMoMa showcases generalizable whole-body robot policies capable of operating in the diverse, unstructured settings of the real world.