VisFly-Lab: Unified Differentiable Framework for First-Order Reinforcement Learning of Quadrotor Control

arXiv cs.RO / 3/24/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes VisFly-Lab, a unified, extensible differentiable-simulation framework for first-order reinforcement learning aimed at multi-task quadrotor control (hovering, tracking, landing, and racing).
It provides a common wrapped interface and deployment-oriented dynamics to reduce fragmentation across task-specific quadrotor RL settings.
The authors identify two training bottlenecks in standard first-order methods—limited state coverage from horizon initialization and gradient bias from partially non-differentiable rewards.
To address these issues, they introduce Amended Backpropagation Through Time (ABPT), combining differentiable rollout optimization, a value-based auxiliary objective, and visited-state initialization to improve robustness.
Experiments show the largest gains for tasks with partially non-differentiable rewards, and the paper also reports proof-of-concept real-world deployment with some policy transfer from simulation.

Abstract

First-order reinforcement learning with differentiable simulation is promising for quadrotor control, but practical progress remains fragmented across task-specific settings. To support more systematic development and evaluation, we present a unified differentiable framework for multi-task quadrotor control. The framework is wrapped, extensible, and equipped with deployment-oriented dynamics, providing a common interface across four representative tasks: hovering, tracking, landing, and racing. We also present the suite of first-order learning algorithms, where we identify two practical bottlenecks of standard first-order training: limited state coverage caused by horizon initialization and gradient bias caused by partially non-differentiable rewards. To address these issues, we propose Amended Backpropagation Through Time (ABPT), which combines differentiable rollout optimization, a value-based auxiliary objective, and visited-state initialization to improve training robustness. Experimental results show that ABPT yields the clearest gains in tasks with partially non-differentiable rewards, while remaining competitive in fully differentiable settings. We further provide proof-of-concept real-world deployments showing initial transferability of policies learned in the proposed framework beyond simulation.

5 Signs Your Consulting Firm Needs AI Agents (Not More Staff)

Dev.to

AgentDesk vs Hiring Another Consultant: A Cost Comparison

Dev.to

"Why Your AI Agent Needs a System 1"

Dev.to

When should we expect TurboQuant?

Reddit r/LocalLLaMA

AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia

Dev.to

VisFly-Lab: Unified Differentiable Framework for First-Order Reinforcement Learning of Quadrotor Control

Key Points

Abstract

Related Articles

5 Signs Your Consulting Firm Needs AI Agents (Not More Staff)

AgentDesk vs Hiring Another Consultant: A Cost Comparison

"Why Your AI Agent Needs a System 1"

When should we expect TurboQuant?

AI as Your Customs Co-Pilot: Automating HS Code Chaos in Southeast Asia

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer