Morphology-Consistent Humanoid Interaction through Robot-Centric Video Synthesis
arXiv cs.RO / 3/23/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- Dream2Act introduces a robot-centric, zero-shot interaction framework that uses a third-person image of the robot and a target object to synthesize plausible robot motion via video generation, avoiding morphology gaps from human-to-robot retargeting.
- It relies on a high-fidelity pose extraction system to recover feasible robot-native joint trajectories from the synthesized dreams and executes them with a general-purpose whole-body controller within the robot's native coordinate space.
- By staying in robot-native coordinates and not requiring task-specific policy training, it overcomes the morphology mismatch and retargeting errors that hinder contact formation.
- In Unitree G1 experiments on four whole-body tasks (ball kicking, sofa sitting, bag punching, box hugging), Dream2Act achieves 37.5% success vs 0% for conventional retargeting, demonstrating substantially improved interaction reliability.
Related Articles

Interactive Web Visualization of GPT-2
Reddit r/artificial
Stop Treating AI Interview Fraud Like a Proctoring Problem
Dev.to
[R] Causal self-attention as a probabilistic model over embeddings
Reddit r/MachineLearning
The 5 software development trends that actually matter in 2026 (and what they mean for your startup)
Dev.to
InVideo AI Review: Fast Finished
Dev.to