CubeDAgger: Interactive Imitation Learning for Dynamic Systems with Efficient yet Low-risk Interaction

arXiv cs.RO / 4/23/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses a limitation of interactive imitation learning in dynamic systems, where mismatched supervision timing can cause abrupt actions that destabilize a robot.
  • It introduces CubeDAgger, built on a baseline called EnsembleDAgger, to improve robustness while reducing dynamic stability violations in dynamic tasks.
  • CubeDAgger incorporates three key enhancements: supervision-timing threshold regularization, an optimal consensus mechanism over multiple expert/action candidates, and autoregressive colored-noise injection for temporally consistent exploration.
  • Simulation results indicate the learned policies are robust and maintain dynamic stability during interaction.
  • Real-robot scooping experiments show the method can learn from scratch using only about 30 minutes of interaction with a human expert.

Abstract

Interactive imitation learning makes an agent's control policy robust by stepwise supervisions from an expert. The recent algorithms mostly employ expert-agent switching systems to reduce the expert's burden by limitedly selecting the supervision timing. However, this approach is useful only for static tasks; in dynamic tasks, timing discrepancies cause abrupt changes in actions, losing the robot's dynamic stability. This paper therefore proposes a novel method, named CubeDAgger, which improves robustness with less dynamic stability violations even for dynamic tasks. The proposed method is designed on a baseline, EnsembleDAgger, with three improvements. The first adds a regularization to explicitly activate the threshold for deciding the supervision timing. The second transforms the expert-agent switching system to an optimal consensus system of multiple action candidates. Third, autoregressive colored noise is injected to the agent's actions for time-consistent exploration. These improvements are verified by simulations, showing that the trained policies are sufficiently robust while maintaining dynamic stability during interaction. Finally, real-robot scooping experiments with a human expert demonstrate that the proposed method can learn robust policies from scratch based on just 30 minutes of interaction. https://youtu.be/kBl3SCTnVEM