Off-Policy Evaluation and Learning for Survival Outcomes under Censoring
arXiv stat.ML / 3/25/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses how to optimize and evaluate survival-related objectives (e.g., patient survival or customer retention) from logged data using Off-Policy Evaluation (OPE), avoiding risky online experiments.
- It argues that standard OPE estimators fail for right-censored outcomes because they ignore unobserved survival times past censoring, which can systematically underestimate policy performance.
- The authors propose new censoring-aware estimators, IPCW-IPS and IPCW-DR, based on Inverse Probability of Censoring Weighting to correct for censoring bias.
- They prove unbiasedness for the proposed estimators and show IPCW-DR is doubly robust (consistent if either the propensity model or outcome model is correct).
- The framework is further extended to constrained Off-Policy Learning under budget constraints, with validation via simulations and demonstrations on public real-world datasets.
Related Articles
The Security Gap in MCP Tool Servers (And What I Built to Fix It)
Dev.to

Adversarial AI framework reveals mechanisms behind impaired consciousness and a potential therapy
Reddit r/artificial
Why I Switched From GPT-4 to Small Language Models for Two of My Products
Dev.to
Orchestrating AI Velocity: Building a Decoupled Control Plane for Agentic Development
Dev.to
In the Kadrey v. Meta Platforms case, Judge Chabbria's quest to bust the fair use copyright defense to generative AI training rises from the dead!
Reddit r/artificial