3D Generation for Embodied AI and Robotic Simulation: A Survey

arXiv cs.CV / 4/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The survey argues that embodied AI and robotics require scalable, diverse, physically grounded 3D assets to support simulation-based training and real-world deployment.
  • It organizes the literature by three roles for 3D generation: as a data generator (articulated, physically grounded, deformable assets), as simulation environments (interactive, task-oriented, controllable/agentic scenes), and as a sim2real bridge (digital twin reconstruction, augmentation, and synthetic demonstrations).
  • The paper emphasizes that success in embodied settings depends on more than visual realism, including kinematic structure, material properties, and interaction/task execution readiness.
  • It identifies key bottlenecks such as limited physical annotations, the mismatch between geometric quality and physical validity, fragmented evaluation methods, and the ongoing sim-to-real gap that still limits reliable transfer.
  • It claims the field is shifting focus from purely visual quality toward interaction readiness, aiming to make 3D generation a dependable foundation for embodied intelligence.

Abstract

Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has advanced rapidly, embodied applications impose requirements far beyond visual realism: generated objects must carry kinematic structure and material properties, scenes must support interaction and task execution, and the resulting content must bridge the gap between simulation and reality. This survey presents the first survey of 3D generation for embodied AI and organizes the literature around three roles that 3D generation plays in embodied systems. In \emph{Data Generator}, 3D generation produces simulation-ready objects and assets, including articulated, physically grounded, and deformable content for downstream interaction; in \emph{Simulation Environments}, it constructs interactive and task-oriented worlds, spanning structure-aware, controllable, and agentic scene generation; and in \emph{Sim2Real Bridge}, it supports digital twin reconstruction, data augmentation, and synthetic demonstrations for downstream robot learning and real-world transfer. We also show that the field is shifting from visual realism toward interaction readiness, and we identify the main bottlenecks, including limited physical annotations, the gap between geometric quality and physical validity, fragmented evaluation, and the persistent sim-to-real divide, that must be addressed for 3D generation to become a dependable foundation for embodied intelligence. Our project page is at https://3dgen4robot.github.io.