Learning to Race in Minutes: Infoprop Dyna on the Mini Wheelbot

arXiv cs.LG / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper introduces an approach to reinforcement learning that avoids reliance on carefully engineered physics simulators and domain randomization for sim-to-real transfer.
  • It demonstrates that Infoprop Dyna, an uncertainty-aware model-based reinforcement learning framework, can learn directly from real-world interactions.
  • Using the Mini Wheelbot (an underactuated unicycle robot), the system learns to race around a track in about 11 minutes of real-world experience.
  • The work argues that reinforcement learning can help robots with fast, nonlinear, and unstable dynamics reach high performance limits more directly via real interaction data.
  • Overall, the results suggest a significant reduction in the wall-clock time typically required for real-world learning compared with simulator-heavy pipelines.

Abstract

Reinforcement Learning (RL) has the potential to enable robots with fast, nonlinear, and unstable dynamics to reach the limits of their performance. However, most recent advances rely on carefully designed physics-based simulators and domain randomization to achieve successful sim-to-real transfer within reasonable wall-clock time. In this work, we bypass the need for such simulators and demonstrate that Infoprop Dyna, a state-of-the-art uncertainty-aware model-based reinforcement learning (MBRL) framework, can enable robots to learn directly from real-world interactions. Using Infoprop Dyna, the Mini Wheelbot, an underactuated unicycle robot, learns to race around a track within 11 minutes of real-world experience.