Focal plane wavefront control with model-based reinforcement learning

arXiv cs.RO / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses non-common-path aberrations (NCPAs) that limit high-contrast imaging for directly detecting potentially habitable exoplanets, where speckle noise and static aberrations degrade observations near bright host stars.
  • It proposes a model-based reinforcement learning approach, Policy Optimization for NCPAs (PO4NCPA), which uses sequential phase diversity and focal-plane images to compute phase corrections without prior system knowledge.
  • Through numerical simulations on a ground-based telescope and an infrared imager with water-vapor-induced seeing (dynamic NCPAs), PO4NCPA is shown to robustly compensate both static and dynamic NCPAs.
  • In static scenarios, the method achieves near-optimal focal-plane light suppression with a coronagraph and near-optimal Strehl without one, and in dynamic scenarios it matches a reference technique’s performance metrics.
  • The approach is demonstrated to generalize across ELT pupil configurations and a vector vortex coronagraph, remains effective under photon/background noise, and has sub-millisecond inference suitable for real-time low-order atmospheric correction.

Abstract

The direct imaging of potentially habitable exoplanets is one prime science case for high-contrast imaging instruments on extremely large telescopes. Most such exoplanets orbit close to their host stars, where their observation is limited by fast-moving atmospheric speckles and quasi-static non-common-path aberrations (NCPA). Conventional NCPA correction methods often use mechanical mirror probes, which compromise performance during operation. This work presents machine-learning-based NCPA control methods that automatically detect and correct both dynamic and static NCPA errors by leveraging sequential phase diversity. We extend previous work in reinforcement learning for AO to focal plane control. A new model-based RL algorithm, Policy Optimization for NCPAs (PO4NCPA), interprets the focal-plane image as input data and, through sequential phase diversity, determines phase corrections that optimize both non-coronagraphic and post-coronagraphic PSFs without prior system knowledge. Further, we demonstrate the effectiveness of this approach by numerically simulating static NCPA errors on a ground-based telescope and an infrared imager affected by water-vapor-induced seeing (dynamic NCPAs). Simulations show that PO4NCPA robustly compensates static and dynamic NCPAs. In static cases, it achieves near-optimal focal-plane light suppression with a coronagraph and near-optimal Strehl without one. With dynamics NCPA, it matches the performance of the modal least-squares reconstruction combined with a 1-step delay integrator in these metrics. The method remains effective for the ELT pupil, vector vortex coronagraph, and under photon and background noise. PO4NCPA is model-free and can be directly applied to standard imaging as well as to any coronagraph. Its sub-millisecond inference times and performance also make it suitable for real-time low-order correction of atmospheric turbulence beyond HCI.