Mitigating Data Scarcity in Spaceflight Applications for Offline Reinforcement Learning Using Physics-Informed Deep Generative Models

arXiv cs.LG / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets the simulation-to-reality (sim-to-real) gap in reinforcement learning (RL) controllers for spaceflight, where real-world training data are extremely scarce.
  • It proposes MI-VAE, a physics-informed variational autoencoder that injects physics-based learning bias by modeling the discrepancy between observed trajectories and physics model predictions.
  • The MI-VAE’s latent space is used to generate synthetic trajectory datasets that better respect physical constraints for offline RL training.
  • In a planetary lander benchmark with limited real-world data, augmenting offline RL datasets with MI-VAE-generated samples improves RL performance and policy success rate compared with standard VAE-based augmentation.
  • Overall, the work offers a scalable approach to improving autonomous controller robustness in data-constrained, physics-dominated environments like space missions.

Abstract

The deployment of reinforcement learning (RL)-based controllers on physical systems is often limited by poor generalization to real-world scenarios, known as the simulation-to-reality (sim-to-real) gap. This gap is particularly challenging in spaceflight, where real-world training data are scarce due to high cost and limited planetary exploration data. Traditional approaches, such as system identification and synthetic data generation, depend on sufficient data and often fail due to modeling assumptions or lack of physics-based constraints. We propose addressing this data scarcity by introducing physics-based learning bias in a generative model. Specifically, we develop the Mutual Information-based Split Variational Autoencoder (MI-VAE), a physics-informed VAE that learns differences between observed system trajectories and those predicted by physics-based models. The latent space of the MI-VAE enables generation of synthetic datasets that respect physical constraints. We evaluate MI-VAE on a planetary lander problem, focusing on limited real-world data and offline RL training. Results show that augmenting datasets with MI-VAE samples significantly improves downstream RL performance, outperforming standard VAEs in statistical fidelity, sample diversity, and policy success rate. This work demonstrates a scalable strategy for enhancing autonomous controller robustness in complex, data-constrained environments.