Human-Robot Copilot for Data-Efficient Imitation Learning

arXiv cs.RO / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces the “Human-Robot Copilot” framework to improve data-efficient imitation learning when only a small number of teleoperation demonstrations are available.
  • It targets the problem of policies drifting into out-of-distribution (OOD) states caused by compounding errors or environmental stochasticity.
  • The proposed approach extends the Human-Gated DAgger (HG-DAgger) idea by using a scaling factor for dexterous teleoperation while keeping compatibility across many industrial and research robot manipulators.
  • Experiments show that the framework achieves higher task performance using the same number of demonstration trajectories compared with prior interactive/human-in-the-loop methods.
  • Because human corrective interventions are needed only intermittently, the overall data collection process is more efficient and requires less time than continuous correction strategies.

Abstract

Collecting human demonstrations via teleoperation is a common approach for teaching robots task-specific skills. However, when only a limited number of demonstrations are available, policies are prone to entering out-of-distribution (OOD) states due to compounding errors or environmental stochasticity. Existing interactive imitation learning or human-in-the-loop methods try to address this issue by following the Human-Gated DAgger (HG-DAgger) paradigm, an approach that augments demonstrations through selective human intervention during policy execution. Nevertheless, these approaches struggle to balance dexterity and generality: they either provide fine-grained corrections but are limited to specific kinematic structures, or achieve generality at the cost of precise control. To overcome this limitation, we propose the Human-Robot Copilot framework that can leverage a scaling factor for dexterous teleoperation while maintaining compatibility with a wide range of industrial and research manipulators. Experimental results demonstrate that our framework achieves higher performance with the same number of demonstration trajectories. Moreover, since corrective interventions are required only intermittently, the overall data collection process is more efficient and less time-consuming.