Bi-Level Reinforcement Learning Control for an Underactuated Blimp via Center-of-Mass Reconfiguration

arXiv cs.RO / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies goal-directed tracking control for underactuated blimps by using center-of-mass (CoM) reconfiguration with a compact hardware setup of two thrusters and a movable internal slider.
  • It argues that this design improves energy efficiency and payload capacity but creates strong nonlinear coupling and significant underactuation between CoM dynamics and vehicle motion.
  • To manage these difficulties, the authors propose a bi-level reinforcement learning framework that separates CoM planning (outer policy) from thrust control for reference tracking (inner policy).
  • A two-stage learning strategy and convergence analysis are introduced to stabilize the bi-level RL training process.
  • Extensive simulations and real-world experiments on a 27-goal evaluation set show improved tracking accuracy and robustness over fixed-CoM baselines and PID controllers, with reliable sim-to-real transfer.

Abstract

This paper investigates goal-directed tracking control of underactuated blimps with center-of-mass (CoM) reconfiguration. Unlike conventional overactuated blimp designs that rely on redundant actuation for simplified control, this paper focuses on a compact architecture consisting of two thrusters and a movable internal slider, aiming to improve energy efficiency and payload capacity. This hardware-efficient configuration introduces significant underactuation and strong nonlinear coupling between CoM dynamics and vehicle motion. To address these challenges, this paper proposes a bi-level reinforcement learning framework that explicitly decouples task-level CoM planning from continuous thrust control. The outer policy determines a target-dependent CoM configuration prior to flight, while the inner policy generates thrust commands to track straight-line references. To ensure stable learning, this paper introduces a two-stage learning strategy, supported by a convergence analysis of the resulting bi-level process. Extensive simulations and real-world experiments on a 27-goal evaluation set demonstrate that the proposed method consistently outperforms fixed-CoM baselines and PID-based controllers, achieving higher tracking accuracy, enhanced robustness, and reliable sim-to-real transfer.