Generative Simulation for Policy Learning in Physical Human-Robot Interaction

arXiv cs.RO / 4/13/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a zero-shot “text2sim2real” generative simulation framework for physical human-robot interaction that synthesizes diverse scenarios from natural-language prompts.
  • It uses LLMs and VLMs to procedurally generate soft-body human models, scene layouts, and robot motion trajectories for assistive tasks.
  • The framework enables large-scale synthetic demonstration collection and trains vision-based imitation learning policies using segmented point clouds.
  • Experiments via a user study on scratching and bathing show the learned policies achieve zero-shot sim-to-real transfer with success rates above 80% and robustness to unscripted human motion.
  • The authors position this as the first generative simulation pipeline that automates simulation environment synthesis, synthetic data generation, and policy learning for pHRI.

Abstract

Developing autonomous physical human-robot interaction (pHRI) systems is limited by the scarcity of large-scale training data to learn robust robot behaviors for real-world applications. In this paper, we introduce a zero-shot "text2sim2real" generative simulation framework that automatically synthesizes diverse pHRI scenarios from high-level natural-language prompts. Leveraging Large Language Models (LLMs) and Vision-Language Models (VLMs), our pipeline procedurally generates soft-body human models, scene layouts, and robot motion trajectories for assistive tasks. We utilize this framework to autonomously collect large-scale synthetic demonstration datasets and then train vision-based imitation learning policies operating on segmented point clouds. We evaluate our approach through a user study on two physically assistive tasks: scratching and bathing. Our learned policies successfully achieve zero-shot sim-to-real transfer, attaining success rates exceeding 80% and demonstrating resilience to unscripted human motion. Overall, we introduce the first generative simulation pipeline for pHRI applications, automating simulation environment synthesis, data collection, and policy learning. Additional information may be found on our project website: https://rchi-lab.github.io/gen_phri/