Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version

arXiv stat.ML / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses continuous-time stochastic control problems with fully non-Markovian dynamics and unknown model parameters, motivated by settings such as path-dependent SDEs, rough-volatility hedging, and fractional Brownian motion–driven systems.
  • It proposes a Monte Carlo learning framework for an embedded backward dynamic programming equation using an off-model training setup: generate a fixed synthetic dataset under a reference law and recover target-model dynamic programming operators via importance sampling with explicit dominating training laws and Radon–Nikodym weights.
  • A key contribution is an adaptive update mechanism under parametric model uncertainty that reweights the same training sample for repeated recalibration, avoiding costly regeneration of trajectories.
  • The authors provide non-asymptotic error bounds for deep neural network approximation of the embedded dynamic programming equation under fixed parameters and separate Monte Carlo approximation error from model-risk error for adaptive learning.
  • Numerical experiments in structured linear-quadratic examples demonstrate the off-model training and adaptive importance-sampling update approaches.

Abstract

This paper studies continuous-time stochastic control problems whose controlled states are fully non-Markovian and depend on unknown model parameters. Such problems arise naturally in path-dependent stochastic differential equations, rough-volatility hedging, and systems driven by fractional Brownian motion. Building on the discrete skeleton approach developed in earlier work, we propose a Monte Carlo learning methodology for the associated embedded backward dynamic programming equation. Our main contribution is twofold. First, we construct explicit dominating training laws and Radon--Nikodym weights for several representative classes of non-Markovian controlled systems. This yields an off-model training architecture in which a fixed synthetic dataset is generated under a reference law, while the dynamic programming operators associated with a target model are recovered by importance sampling. Second, we use this structure to design an adaptive update mechanism under parametric model uncertainty, so that repeated recalibration can be performed by reweighting the same training sample rather than regenerating new trajectories. For fixed parameters, we establish non-asymptotic error bounds for the approximation of the embedded dynamic programming equation via deep neural networks. For adaptive learning, we derive quantitative estimates that separate Monte Carlo approximation error from model-risk error. Numerical experiments illustrate both the off-model training mechanism and the adaptive importance-sampling update in structured linear-quadratic examples.

Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version | AI Navigate