Cost-optimal Sequential Testing via Doubly Robust Q-learning

arXiv stat.ML / 4/14/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies how to learn cost-optimal sequential clinical testing policies from retrospective data, where future tests may be missing depending on earlier results (informative missingness).
It proposes a doubly robust Q-learning framework under a sequential missing-at-random assumption, using path-specific inverse probability weights and auxiliary contrast models to handle test-trajectory heterogeneity.
The method constructs orthogonal pseudo-outcomes that yield unbiased policy learning if either the acquisition (missingness) model or the contrast model is correctly specified.
The authors provide theoretical guarantees (oracle inequalities, convergence rates, regret and misclassification bounds) for stage-wise estimators and validate improved cost-adjusted performance via simulations and a prostate cancer cohort application.

Abstract

Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is correctly specified. We establish oracle inequalities for the stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates for the learned policy. Simulations demonstrate improved cost-adjusted performance over weighted and complete-case baselines, and an application to a prostate cancer cohort study illustrates how the method reduces testing cost without compromising predictive accuracy.

FastAPI With LangChain and MongoDB

Dev.to

Best AI Game Creator in 2026

Dev.to

Smart AI Recruiter Assistant with OpenClaw

Dev.to

[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup

Dev.to

Beyond Chatbots: Building Your AI-Coaching Engine

Dev.to

Cost-optimal Sequential Testing via Doubly Robust Q-learning

Key Points

Abstract

Related Articles

FastAPI With LangChain and MongoDB

Best AI Game Creator in 2026

Smart AI Recruiter Assistant with OpenClaw

[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup

Beyond Chatbots: Building Your AI-Coaching Engine

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer