Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version
arXiv stat.ML / 4/16/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses continuous-time stochastic control problems with fully non-Markovian dynamics and unknown model parameters, motivated by settings such as path-dependent SDEs, rough-volatility hedging, and fractional Brownian motion–driven systems.
- It proposes a Monte Carlo learning framework for an embedded backward dynamic programming equation using an off-model training setup: generate a fixed synthetic dataset under a reference law and recover target-model dynamic programming operators via importance sampling with explicit dominating training laws and Radon–Nikodym weights.
- A key contribution is an adaptive update mechanism under parametric model uncertainty that reweights the same training sample for repeated recalibration, avoiding costly regeneration of trajectories.
- The authors provide non-asymptotic error bounds for deep neural network approximation of the embedded dynamic programming equation under fixed parameters and separate Monte Carlo approximation error from model-risk error for adaptive learning.
- Numerical experiments in structured linear-quadratic examples demonstrate the off-model training and adaptive importance-sampling update approaches.
Related Articles

Introducing Claude Opus 4.7
Anthropic News

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability
Dev.to

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp
Dev.to

Config-first code generator to replace repetitive AI boilerplate — looking for feedback and collaborators
Dev.to

The US Government Fired 40% of an Agency, Then Asked AI to Do Their Jobs
Dev.to