Whole-Body Mobile Manipulation using Offline Reinforcement Learning on Sub-optimal Controllers

arXiv cs.RO / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper presents WHOLE-MoMa, a two-stage approach for whole-body mobile manipulation that leverages a sub-optimal whole-body controller as a structural prior rather than relying on teleoperation or heavy reward engineering.
It generates diverse demonstrations by randomizing a lightweight WBC, then uses offline reinforcement learning to discover and “stitch” improved behaviors using a learned reward signal.
To handle complex coordination, the method extends offline implicit Q-learning with Q-chunking to evaluate chunk-level critics and uses advantage-weighted policy extraction for action-chunked diffusion policies.
In simulation on increasingly difficult tasks with a TIAGo++ mobile manipulator, WHOLE-MoMa outperforms hierarchical WBCs, behavior cloning, and multiple offline RL baselines.
The learned policies transfer directly to a real robot without finetuning, reporting 80% success on bimanual drawer manipulation and 68% on simultaneous cupboard opening and object placement without any real-world training data.

Abstract

Mobile Manipulation (MoMa) of articulated objects, such as opening doors, drawers, and cupboards, demands simultaneous, whole-body coordination between a robot's base and arms. Classical whole-body controllers (WBCs) can solve such problems via hierarchical optimization, but require extensive hand-tuned optimization and remain brittle. Learning-based methods, on the other hand, show strong generalization capabilities but typically rely on expensive whole-body teleoperation data or heavy reward engineering. We observe that even a sub-optimal WBC is a powerful structural prior: it can be used to collect data in a constrained, task-relevant region of the state-action space, and its behavior can still be improved upon using offline reinforcement learning. Building on this, we propose WHOLE-MoMa, a two-stage pipeline that first generates diverse demonstrations by randomizing a lightweight WBC, and then applies offline RL to identify and stitch together improved behaviors via a reward signal. To support the expressive action-chunked diffusion policies needed for complex coordination tasks, we extend offline implicit Q-learning with Q-chunking for chunk-level critic evaluation and advantage-weighted policy extraction. On three tasks of increasing difficulty using a TIAGo++ mobile manipulator in simulation, WHOLE-MoMa significantly outperforms WBC, behavior cloning, and several offline RL baselines. Policies transfer directly to the real robot without finetuning, achieving 80% success in bimanual drawer manipulation and 68% in simultaneous cupboard opening and object placement, all without any teleoperated or real-world training data.

Black Hat Asia

AI Business

The Complete Guide to Better Meeting Productivity with AI Note-Taking

Dev.to

5 Ways Real-Time AI Can Boost Your Sales Call Performance

Dev.to

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG

Dev.to

Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]

Reddit r/MachineLearning

Whole-Body Mobile Manipulation using Offline Reinforcement Learning on Sub-optimal Controllers

Key Points

Abstract

Related Articles

Black Hat Asia

The Complete Guide to Better Meeting Productivity with AI Note-Taking

5 Ways Real-Time AI Can Boost Your Sales Call Performance

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG

Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer