Neural Backward Reach-Avoid Tubes with MPC Supervision for High-Dimensional Systems: An Application to Safe Spacecraft Docking

arXiv cs.RO / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper tackles autonomous spacecraft docking by learning control policies that provide formal reach-avoid (collision avoidance plus target reachability) guarantees under coupled, high-dimensional translational-rotational dynamics.
It proposes a learning-based Backward Reach-Avoid Tube (BRAT) framework that tightly combines Hamilton-Jacobi (HJ) reachability structure with MPC-based supervision to overcome limitations of classical low-dimensional solvers and weaknesses of purely PDE-based learning in tightly coupled docking sets.
In the offline training stage, the authors train a neural approximation of the HJ value function using PDE losses, enhanced by curriculum-driven MPC supervision to stabilize learning and produce more informative value targets.
In the online deployment stage, the learned value function is used by two real-time controllers: a value-gradient-driven controller and a terminal MPC that explicitly enforces reachability at the horizon.
Experiments on a 6D planar docking task and a scaled-up full 13D system show improved success rate and computational efficiency compared with existing methods.

Abstract

Autonomous spacecraft docking requires control policies that simultaneously ensure collision avoidance and target reachability under coupled, high-dimensional translational-rotational dynamics. Hamilton-Jacobi (HJ) reachability provides formal reach-avoid guarantees, but classical solvers are limited to low-dimensional systems. Learning-based approaches have begun to scale HJ analysis, yet they struggle in reach-avoid settings, especially where goal and failure sets are tightly coupled, as in docking. We propose a learning-based Backward Reach-Avoid Tube (BRAT) framework that addresses this challenge by tightly integrating HJ structure with MPC-based supervision. In the offline phase, we train a neural approximation of the HJ value function using PDE-based losses augmented with curriculum-driven MPC supervision, which provides informative value targets and stabilizes training in regions where purely PDE-based methods fail. In the online phase, the learned value function is deployed through two real-time controllers: (i) a value gradient-driven controller, and (ii) a value-function-augmented terminal MPC that explicitly enforces reachability at the horizon. We evaluate the proposed method on a 6D planar docking problem against grid-based ground truth and then scale to the full 13D system. Across both settings, our approach outperforms existing methods in success rate and computational efficiency.