Whole-Body Mobile Manipulation using Offline Reinforcement Learning on Sub-optimal Controllers
arXiv cs.RO / 4/15/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper presents WHOLE-MoMa, a two-stage approach for whole-body mobile manipulation that leverages a sub-optimal whole-body controller as a structural prior rather than relying on teleoperation or heavy reward engineering.
- It generates diverse demonstrations by randomizing a lightweight WBC, then uses offline reinforcement learning to discover and “stitch” improved behaviors using a learned reward signal.
- To handle complex coordination, the method extends offline implicit Q-learning with Q-chunking to evaluate chunk-level critics and uses advantage-weighted policy extraction for action-chunked diffusion policies.
- In simulation on increasingly difficult tasks with a TIAGo++ mobile manipulator, WHOLE-MoMa outperforms hierarchical WBCs, behavior cloning, and multiple offline RL baselines.
- The learned policies transfer directly to a real robot without finetuning, reporting 80% success on bimanual drawer manipulation and 68% on simultaneous cupboard opening and object placement without any real-world training data.
Related Articles

Black Hat Asia
AI Business

The Complete Guide to Better Meeting Productivity with AI Note-Taking
Dev.to

5 Ways Real-Time AI Can Boost Your Sales Call Performance
Dev.to

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG
Dev.to
Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]
Reddit r/MachineLearning