Optimal sequential decision-making for error propagation mitigation in digital twins

arXiv cs.LG / 4/27/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper formulates error-propagation mitigation in modular digital twins as a sequential decision problem, using an inferred “error regime” as the basis for decision-making.
It develops both an MDP (with HMM-inferred regimes as states and a reward balancing fidelity versus maintenance cost) and a POMDP (using Bayesian belief updates and an HMM confusion matrix as the observation model under imperfect regime classification).
The authors solve both formulations with dynamic programming and validate them using Gillespie stochastic simulation, then benchmark model-free reinforcement learning methods (Q-learning and REINFORCE) for policy learning without explicit model knowledge.
Results show the MDP policy yields the highest cumulative reward and time in nominal operation, while the POMDP attains about 95% of MDP performance under realistic observation noise, with statistically significant policy-performance gaps and robustness across key parameters.

Abstract

Here, we explore the problem of error propagation mitigation in modular digital twins as a sequential decision process. Building on a companion study that used a Hidden Markov Model (HMM) to infer latent error regimes from surrogate-physics residuals, we develop a Markov Decision Process (MDP) in which the inferred regimes serve as states, corrective interventions serve as actions, and a scalar reward that takes into consideration the cost-benefit tradeoff between system fidelity and maintenance expense. The baseline transition matrix is extracted from the HMM-learned parameters. We then extend the formulation to a Partially Observable MDP (POMDP) that accounts for the imperfect nature of regime classification by maintaining a belief distribution updated via Bayesian filtering, with the HMM confusion matrix serving as the observation model. Both formulations are solved via dynamic programming and validated through Gillespie stochastic simulation. We then benchmark two model-free reinforcement learning algorithms, Q-learning and REINFORCE, to assess whether effective policies can be learned without explicit model knowledge. A systematic comparison of different intervention policies demonstrates that the MDP policy achieves the highest cumulative reward and fraction of time in nominal operation, while the POMDP recovers approximately 95\% of MDP performance under realistic observation noise. Sensitivity analyses across observation quality, repair probability, and discount factor confirm the robustness of these conclusions, and the major gaps in the policy hierarchy are statistically significant at

p < 0.001

. The gap between MDP and POMDP performance quantifies the value of information providing a principled criterion for investing in improved classification accuracy.

Subagents: The Building Block of Agentic AI

Dev.to

Context Compression in .NET

Dev.to

Why Cursor Keeps Writing MD5 Password Hashes (CWE-328)

Dev.to

GET Serves Cache, POST Runs Inference: Cost Safety for a Public LLM Endpoint

Dev.to

DeepSeek-V4 Models Could Change Global AI Race

AI Business

Optimal sequential decision-making for error propagation mitigation in digital twins

Key Points

Abstract

Related Articles

Subagents: The Building Block of Agentic AI

Context Compression in .NET

Why Cursor Keeps Writing MD5 Password Hashes (CWE-328)

GET Serves Cache, POST Runs Inference: Cost Safety for a Public LLM Endpoint

DeepSeek-V4 Models Could Change Global AI Race

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer