Simple but Stable, Fast and Safe: Achieve End-to-end Control by High-Fidelity Differentiable Simulation

arXiv cs.RO / 4/14/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses vision-based obstacle avoidance for quadrotors, noting that common planning methods using point-mass models can produce trajectories that become dynamically infeasible at high speeds.
  • It proposes an end-to-end reinforcement learning policy that maps depth images directly to low-level bodyrate commands using differentiable simulation for training.
  • By using high-fidelity simulation after parameter identification and differentiable analysis, the approach aims to close the sim-to-real gap and provide accurate gradients for efficient training without expert demonstrations.
  • The resulting inference pipeline is designed to be lightweight and simple, avoiding extra architectural components (e.g., backbone/recurrence/primitives) and relying on direct low-level control.
  • Experiments report improved success rate and lower jerk versus baselines, with strong zero-shot generalization to unseen outdoor environments and flight speeds up to 7.5 m/s, including dense-forest scenarios.

Abstract

Obstacle avoidance is a fundamental vision-based task essential for enabling quadrotors to perform advanced applications. When planning the trajectory, existing approaches both on optimization and learning typically regard quadrotor as a point-mass model, giving path or velocity commands then tracking the commands by outer-loop controller. However, at high speeds, planned trajectories sometimes become dynamically infeasible in actual flight, which beyond the capacity of controller. In this paper, we propose a novel end-to-end policy that directly maps depth images to low-level bodyrate commands by reinforcement learning via differentiable simulation. The high-fidelity simulation in training after parameter identification significantly reduces all the gaps between training, simulation and real world. Analytical process by differentiable simulation provides accurate gradient to ensure efficiently training the low-level policy without expert guidance. The policy employs a lightweight and the most simple inference pipeline that runs without explicit mapping, backbone networks, primitives, recurrent structures, or backend controllers, nor curriculum or privileged guidance. By inferring low-level command directly to the hardware controller, the method enables full flight envelope control and avoids the dynamic-infeasible issue.Experimental results demonstrate that the proposed approach achieves the highest success rate and the lowest jerk among state-of-the-art baselines across multiple benchmarks. The policy also exhibits strong generalization, successfully deploying zero-shot in unseen, outdoor environments while reaching speeds of up to 7.5m/s as well as stably flying in the super-dense forest.