RAPTOR: A Foundation Policy for Quadrotor Control

arXiv cs.RO / 4/7/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • RAPTORは、単一のエンドツーエンド方策で多様なクアドロターを制御できる「適応型のfoundation policy」を学習する手法だと述べています。
  • 既存のRLベースのニューラル制御は特定環境に過適合しSim2Realギャップや機体変更で崩れやすい一方、RAPTORは計測・再学習なしのゼロショット適応を狙っています。
  • 10種類の実機(32g〜2.4kg、モータ/フレーム/プロペラ/飛行コントローラ構成が多様)で検証し、3層・合計2084パラメータという小型ポリシーでもゼロショット適応が可能だと報告されています。
  • アダプテーションは隠れ層のrecurrenceと、Meta-Imitation Learning(1000機ごとに教師をRLで学習→蒸留)によるin-context learningで実現する設計です。
  • 追従(trajectory tracking)や屋内外、風擾乱、機体への「poking」、プロペラ種の違いなど多条件で性能を広くテストしています。

Abstract

Humans are remarkably data-efficient when adapting to new unseen conditions, like driving a new car. In contrast, modern robotic control systems, like neural network policies trained using Reinforcement Learning (RL), are highly specialized for single environments. Because of this overfitting, they are known to break down even under small differences like the Simulation-to-Reality (Sim2Real) gap and require system identification and retraining for even minimal changes to the system. In this work, we present RAPTOR, a method for training a highly adaptive foundation policy for quadrotor control. Our method enables training a single, end-to-end neural-network policy to control a wide variety of quadrotors. We test 10 different real quadrotors from 32 g to 2.4 kg that also differ in motor type (brushed vs. brushless), frame type (soft vs. rigid), propeller type (2/3/4-blade), and flight controller (PX4/Betaflight/Crazyflie/M5StampFly). We find that a tiny, three-layer policy with only 2084 parameters is sufficient for zero-shot adaptation to a wide variety of platforms. The adaptation through in-context learning is made possible by using a recurrence in the hidden layer. The policy is trained through our proposed Meta-Imitation Learning algorithm, where we sample 1000 quadrotors and train a teacher policy for each of them using RL. Subsequently, the 1000 teachers are distilled into a single, adaptive student policy. We find that within milliseconds, the resulting foundation policy adapts zero-shot to unseen quadrotors. We extensively test the capabilities of the foundation policy under numerous conditions (trajectory tracking, indoor/outdoor, wind disturbance, poking, different propellers).