DreamControl-v2: Simpler and Scalable Autonomous Humanoid Skills via Trainable Guided Diffusion Priors

arXiv cs.RO / 4/2/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces DreamControl-v2, aiming to make autonomous loco-manipulation skills for humanoid robots more robust by improving on the original DreamControl framework that used human motion diffusion models to guide RL training.
  • Instead of relying on an off-the-shelf human motion prior, DreamControl-v2 trains a guided diffusion model directly in the humanoid robot’s own motion space using a unified embodiment space built from diverse human and robot datasets.
  • The approach increases the variety of learned skills by leveraging a larger, mixed training dataset and reduces human intervention by eliminating manual filtering steps in the pipeline.
  • The authors find that scaling reference trajectory generation is important for producing more robust downstream RL policies.
  • Results are validated through extensive experiments in simulation and on a real Unitree-G1 humanoid platform, demonstrating practical feasibility of the improved training method.

Abstract

Developing robust autonomous loco-manipulation skills for humanoids remains an open problem in robotics. While RL has been applied successfully to legged locomotion, applying it to complex, interaction-rich manipulation tasks is harder given long-horizon planning challenges for manipulation. A recent approach along these lines is DreamControl, which addresses these issues by leveraging off-the-shelf human motion diffusion models as a generative prior to guide RL policies during training. In this paper, we investigate the impact of DreamControl's motion prior and propose an improved framework that trains a guided diffusion model directly in the humanoid robot's motion space, aggregating diverse human and robot datasets into a unified embodiment space. We demonstrate that our approach captures a wider range of skills due to the larger training data mixture and establishes a more automated pipeline by removing the need for manual filtering interventions. Furthermore, we show that scaling the generation of reference trajectories is important for achieving robust downstream RL policies. We validate our approach through extensive experiments in simulation and on a real Unitree-G1.