Generating Counterfactual Patient Timelines from Real-World Data

arXiv cs.LG / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes an autoregressive self-supervised generative model that learns from large-scale real-world patient timeline data to produce clinically plausible counterfactual trajectories.
Training on data from over 300,000 patients and 400 million timeline entries enables simulations under alternative clinical scenarios, aiming to support personalized medicine and in silico trials.
In a COVID-19 validation study, the model simulated 7-day outcomes by modifying patient age, CRP, and serum creatinine, yielding mortality changes consistent with clinical expectations.
The simulations also reproduced medication-response patterns, with remdesivir prescriptions increasing for higher CRP and decreasing for impaired kidney function.
The authors conclude that such generative models can serve as a foundation for counterfactual clinical simulation, despite ongoing methodological challenges in the area.

Abstract

Counterfactual simulation - exploring hypothetical consequences under alternative clinical scenarios - holds promise for transformative applications such as personalized medicine and in silico trials. However, it remains challenging due to methodological limitations. Here, we show that an autoregressive generative model trained on real-world data from over 300,000 patients and 400 million patient timeline entries can generate clinically plausible counterfactual trajectories. As a validation task, we applied the model to patients hospitalized with COVID-19 in 2023, modifying age, serum C-reactive protein (CRP), and serum creatinine to simulate 7-day outcomes. Increased in-hospital mortality was observed in counterfactual simulations with older age, elevated CRP, and elevated serum creatinine. Remdesivir prescriptions increased in simulations with higher CRP values and decreased in those with impaired kidney function. These counterfactual trajectories reproduced known clinical patterns. These findings suggest that autoregressive generative models trained on real-world data in a self-supervised manner can establish a foundation for counterfactual clinical simulation.