Machine learning models for estimating counterfactuals in a single-arm inflammatory bowel disease study

arXiv cs.LG / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study evaluates machine-learning “virtual control arms” for single-arm IBD trials by predicting counterfactual outcomes for a treatment arm using models trained on external control data.
  • Five ML counterfactual outcome models were trained on IFX-treated pediatric Crohn’s disease patients to predict 1-year steroid-free clinical remission and CRP plus steroid-free remission for ADA-treated patients.
  • Using the IFX-versus-ADA effect estimates derived from the virtual controls, the authors compare results against propensity score matching to external controls as a reference approach.
  • Gradient-boosted (LGBM) modeling produced odds ratios closest to the propensity-score-matched reference, and all 95% confidence intervals supported the same conclusion: no statistically significant difference in primary or secondary outcomes between ADA and IFX.
  • The authors conclude that virtual controls are a viable alternative to costly, slow, or ethically difficult patient recruitment, and propose a pretrained gradient-boosted model for future studies subject to external validation and transportability checks.

Abstract

Single-arm trials accelerate study timelines by reducing the number of patients that must be recruited for a concurrent control group. However, these designs require an alternative comparator to estimate treatment effects. One approach is to construct a virtual control arm using a machine learning (ML) model trained on external control data to predict the counterfactual outcomes of the treatment arm. Our aim in this study was to leverage virtual controls by developing and evaluating ML-based counterfactual outcome models trained on IFX-treated patients to predict 1-year steroid-free clinical remission (SFCR ) and a composite of C-reactive protein remission plus steroid-free clinical remission (CRP-SFCR) for ADA-treated pediatric Crohn's disease patients, and to compare the resulting IFX-versus-ADA treatment effect estimates with those obtained using propensity score matching to external controls. Five ML models were used to train counterfactual models on the observed IFX cohort data. The resulting models were used to predict the counterfactual outcomes for the ADA arm patients. LGBM yields the best OR closest to the propensity score matched reference, and all 95% CI results align with the conclusion from the reference study that no statistical difference in the primary and secondary outcomes has been observed between the patients treated with ADA or IFX. Our study supports virtual controls as a viable and effective substitute for expensive, lengthy or unethical patient recruitment in an inflammatory bowel disease (IBD) trial. The developed gradient boosted prediction model can be used as a pretrained model to generate IFX counterfactual predictions in future studies, pending external validation and assessment of transportability.