Simulating Infant First-Person Sensorimotor Experience via Motion Retargeting from Babies to Humanoids

arXiv cs.RO / 5/1/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • The paper proposes a framework to simulate infants’ multimodal sensorimotor experiences by retargeting motion from baby videos to humanoid robots and simulators.
  • It reconstructs an infant’s full 3D body pose from a single video by extracting skeletal structure frame-by-frame, then maps that motion onto multiple developmental platforms (physical iCub and virtual pyCub, EMFANT, MIMo).
  • The retargeted replay generates simulated sensory streams such as proprioception, touch, and vision, enabling richer analysis than approaches that only match kinematics.
  • For the best-matching embodiment, the method reports sub-centimeter retargeting accuracy, supporting both developmental-science studies and improved automated behavior annotation.
  • The authors release code publicly, positioning the framework as a tool for robotics, developmental science, and potential early detection of neurodevelopmental disorders.

Abstract

Motion retargeting from humans to human-like artificial agents is becoming increasingly important as humanoid robots grow more capable. However, most existing approaches focus only on reproducing kinematics and ignore the rich sensorimotor experience associated with human movement. In this work, we present a framework for simulating the multimodal sensorimotor experiences of infants using physical and virtual humanoids. From a single video, our method reconstructs the infant's body configuration by extracting its skeletal structure and estimating the full 3D pose from each frame. Then we map the reconstructed motion onto several developmental platforms: the physical iCub robot and the virtual simulators pyCub, EMFANT and MIMo. Replaying the retargeted motions on these embodiments produces simulated multisensory streams including proprioception (joints and muscles), touch, and vision. For the best-matching embodiment, the retargeting achieves sub-centimeter accuracy and enables a rich multimodal analysis of infant development as well as enhanced automated annotation of behaviors. This framework provides a unique window into the infant's sensorimotor experience, offering new tools for robotics, developmental science, and early detection of neurodevelopmental disorders. The code is available at https://github.com/ctu-vras/motion-retargeting/.