Dreaming the Unseen: World Model-regularized Diffusion Policy for Out-of-Distribution Robustness

arXiv cs.RO / 3/24/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Dream Diffusion Policy (DDP), which couples diffusion-based visuomotor control with a diffusion world model trained using a shared 3D visual encoder to improve out-of-distribution (OOD) robustness.
  • DDP mitigates catastrophic failures by detecting discrepancies between real observations and its autoregressive latent “imagination,” then temporarily abandoning corrupted visual input during inference.
  • Instead of freezing or failing, the policy uses internal predicted latent dynamics to generate imagined trajectories and then smoothly realigns with physical reality once the disruption subsides.
  • Experiments report large gains in OOD performance on MetaWorld (73.8% vs 23.9% without predictive imagination) and under severe real-world spatial shifts (83.3% vs 3.3%).
  • A stress test shows DDP can still reach 76.7% success in real-world conditions when switching to open-loop imagination after initialization, indicating strong resilience beyond closed-loop sensing.

Abstract

Diffusion policies excel at visuomotor control but often fail catastrophically under severe out-of-distribution (OOD) disturbances, such as unexpected object displacements or visual corruptions. To address this vulnerability, we introduce the Dream Diffusion Policy (DDP), a framework that deeply integrates a diffusion world model into the policy's training objective via a shared 3D visual encoder. This co-optimization endows the policy with robust state-prediction capabilities. When encountering sudden OOD anomalies during inference, DDP detects the real-imagination discrepancy and actively abandons the corrupted visual stream. Instead, it relies on its internal "imagination" (autoregressively forecasted latent dynamics) to safely bypass the disruption, generating imagined trajectories before smoothly realigning with physical reality. Extensive evaluations demonstrate DDP's exceptional resilience. Notably, DDP achieves a 73.8% OOD success rate on MetaWorld (vs. 23.9% without predictive imagination) and an 83.3% success rate under severe real-world spatial shifts (vs. 3.3% without predictive imagination). Furthermore, as a stress test, DDP maintains a 76.7% real-world success rate even when relying entirely on open-loop imagination post-initialization.