Cost-optimal Sequential Testing via Doubly Robust Q-learning
arXiv stat.ML / 4/14/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies how to learn cost-optimal sequential clinical testing policies from retrospective data, where future tests may be missing depending on earlier results (informative missingness).
- It proposes a doubly robust Q-learning framework under a sequential missing-at-random assumption, using path-specific inverse probability weights and auxiliary contrast models to handle test-trajectory heterogeneity.
- The method constructs orthogonal pseudo-outcomes that yield unbiased policy learning if either the acquisition (missingness) model or the contrast model is correctly specified.
- The authors provide theoretical guarantees (oracle inequalities, convergence rates, regret and misclassification bounds) for stage-wise estimators and validate improved cost-adjusted performance via simulations and a prostate cancer cohort application.
