Meta-Adaptive Beam Search Planning for Transformer-Based Reinforcement Learning Control of UAVs with Overhead Manipulators under Flight Disturbances
arXiv cs.RO / 3/30/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper presents a reinforcement-learning control framework for UAVs with overhead manipulators, targeting poor end-effector trajectory tracking caused by wind and attitude disturbances that couple drone and arm motion.
- It introduces a transformer-based DDQN agent paired with a meta-adaptive, short-horizon beam-search planner that uses the learned transformer critic as a forward estimator to simulate candidate control sequences (SITL-style lookahead) rather than directly applying actions.
- The method uses value estimates over short state sequences for lookahead, while a DDQN backbone provides one-step targets to stabilize learning.
- In experiments on a 3-DoF aerial manipulator under identical training conditions, the approach reports a 10.2% reward increase versus baselines and reduces mean tracking error from about 6% to 3%.
- Under base drift disturbances, the proposed planner maintains more stable tip trajectory tracking (reported as around 5 cm tracking error), outperforming fixed-beam and transformer-only variants.
Related Articles

Black Hat Asia
AI Business

Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs
Dev.to

I Built an AI Agent That Can Write Its Own Tools When It Gets Stuck
Dev.to

How to Create AI Videos in 20 Minutes (3 Free Tools, Zero Experience)
Dev.to

Agent Self-Discovery: How AI Agents Find Their Own Wallets
Dev.to