Path Planning and Reinforcement Learning-Driven Control of On-Orbit Free-Flying Multi-Arm Robots

arXiv cs.RO / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a hybrid motion-planning and control framework that combines trajectory optimization (TO) with reinforcement learning (RL) for free-flying on-orbit multi-arm robots.
  • TO is used to generate feasible, efficient trajectories while explicitly handling dynamic and kinematic constraints, including thruster and arm coordination to improve maneuverability and stability.
  • RL provides adaptive, model-free trajectory tracking under uncertainties and dynamic disturbances, enabling robust control in high-dimensional action spaces and dynamic mismatches.
  • Simulation-based experiments and two case studies (surface motion with initial contact and a free-floating surface-approximation scenario) show the hybrid method outperforming traditional strategies, with thrusters improving smoothness, safety, and efficiency.
  • The work targets key space-robot challenges such as motion coupling and environmental disturbances, positioning the approach as a foundation for more autonomous and effective space robotic systems.

Abstract

This paper presents a hybrid approach that integrates trajectory optimization (TO) and reinforcement learning (RL) for motion planning and control of free-flying multi-arm robots in on-orbit servicing scenarios. The proposed system integrates TO for generating feasible, efficient paths while accounting for dynamic and kinematic constraints, and RL for adaptive trajectory tracking under uncertainties. The multi-arm robot design, equipped with thrusters for precise body control, enables redundancy and stability in complex space operations. TO optimizes arm motions and thruster forces, reducing reliance on the arms for stabilization and enhancing maneuverability. RL further refines this by leveraging model-free control to adapt to dynamic interactions and disturbances. The experimental results validated through comprehensive simulations demonstrate the effectiveness and robustness of the proposed hybrid approach. Two case studies are explored: surface motion with initial contact and a free-floating scenario requiring surface approximation. In both cases, the hybrid method outperforms traditional strategies. In particular, the thrusters notably enhance motion smoothness, safety, and operational efficiency. The RL policy effectively tracks TO-generated trajectories, handling high-dimensional action spaces and dynamic mismatches. This integration of TO and RL combines the strengths of precise, task-specific planning with robust adaptability, ensuring high performance in the uncertain and dynamic conditions characteristic of space environments. By addressing challenges such as motion coupling, environmental disturbances, and dynamic control requirements, this framework establishes a strong foundation for advancing the autonomy and effectiveness of space robotic systems.