Off-Policy Evaluation and Learning for Survival Outcomes under Censoring

arXiv stat.ML / 3/25/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses how to optimize and evaluate survival-related objectives (e.g., patient survival or customer retention) from logged data using Off-Policy Evaluation (OPE), avoiding risky online experiments.
It argues that standard OPE estimators fail for right-censored outcomes because they ignore unobserved survival times past censoring, which can systematically underestimate policy performance.
The authors propose new censoring-aware estimators, IPCW-IPS and IPCW-DR, based on Inverse Probability of Censoring Weighting to correct for censoring bias.
They prove unbiasedness for the proposed estimators and show IPCW-DR is doubly robust (consistent if either the propensity model or outcome model is correct).
The framework is further extended to constrained Off-Policy Learning under budget constraints, with validation via simulations and demonstrations on public real-world datasets.

Abstract

Optimizing survival outcomes, such as patient survival or customer retention, is a critical objective in data-driven decision-making. Off-Policy Evaluation~(OPE) provides a powerful framework for assessing such decision-making policies using logged data alone, without the need for costly or risky online experiments in high-stakes applications. However, typical estimators are not designed to handle right-censored survival outcomes, as they ignore unobserved survival times beyond the censoring time, leading to systematic underestimation of the true policy performance. To address this issue, we propose a novel framework for OPE and Off-Policy Learning~(OPL) tailored for survival outcomes under censoring. Specifically, we introduce IPCW-IPS and IPCW-DR, which employ the Inverse Probability of Censoring Weighting technique to explicitly deal with censoring bias. We theoretically establish that our estimators are unbiased and that IPCW-DR achieves double robustness, ensuring consistency if either the propensity score or the outcome model is correct. Furthermore, we extend this framework to constrained OPL to optimize policy value under budget constraints. We demonstrate the effectiveness of our proposed methods through simulation studies and illustrate their practical impacts using public real-world data for both evaluation and learning tasks.

The Security Gap in MCP Tool Servers (And What I Built to Fix It)

Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy

Reddit r/artificial

Why I Switched From GPT-4 to Small Language Models for Two of My Products

Dev.to

Orchestrating AI Velocity: Building a Decoupled Control Plane for Agentic Development

Dev.to

In the Kadrey v. Meta Platforms case, Judge Chabbria's quest to bust the fair use copyright defense to generative AI training rises from the dead!

Reddit r/artificial

Off-Policy Evaluation and Learning for Survival Outcomes under Censoring

Key Points

Abstract

Related Articles

The Security Gap in MCP Tool Servers (And What I Built to Fix It)

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy

Why I Switched From GPT-4 to Small Language Models for Two of My Products

Orchestrating AI Velocity: Building a Decoupled Control Plane for Agentic Development

In the Kadrey v. Meta Platforms case, Judge Chabbria's quest to bust the fair use copyright defense to generative AI training rises from the dead!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer