Bi-Level Reinforcement Learning Control for an Underactuated Blimp via Center-of-Mass Reconfiguration

arXiv cs.RO / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies goal-directed tracking control for underactuated blimps by using center-of-mass (CoM) reconfiguration with a compact hardware setup of two thrusters and a movable internal slider.
It argues that this design improves energy efficiency and payload capacity but creates strong nonlinear coupling and significant underactuation between CoM dynamics and vehicle motion.
To manage these difficulties, the authors propose a bi-level reinforcement learning framework that separates CoM planning (outer policy) from thrust control for reference tracking (inner policy).
A two-stage learning strategy and convergence analysis are introduced to stabilize the bi-level RL training process.
Extensive simulations and real-world experiments on a 27-goal evaluation set show improved tracking accuracy and robustness over fixed-CoM baselines and PID controllers, with reliable sim-to-real transfer.

Abstract

This paper investigates goal-directed tracking control of underactuated blimps with center-of-mass (CoM) reconfiguration. Unlike conventional overactuated blimp designs that rely on redundant actuation for simplified control, this paper focuses on a compact architecture consisting of two thrusters and a movable internal slider, aiming to improve energy efficiency and payload capacity. This hardware-efficient configuration introduces significant underactuation and strong nonlinear coupling between CoM dynamics and vehicle motion. To address these challenges, this paper proposes a bi-level reinforcement learning framework that explicitly decouples task-level CoM planning from continuous thrust control. The outer policy determines a target-dependent CoM configuration prior to flight, while the inner policy generates thrust commands to track straight-line references. To ensure stable learning, this paper introduces a two-stage learning strategy, supported by a convergence analysis of the resulting bi-level process. Extensive simulations and real-world experiments on a 27-goal evaluation set demonstrate that the proposed method consistently outperforms fixed-CoM baselines and PID-based controllers, achieving higher tracking accuracy, enhanced robustness, and reliable sim-to-real transfer.