Temporal Data Requirement for Predicting Unplanned Hospital Readmissions

arXiv cs.LG / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study examines how the choice of historical observation window affects predictive accuracy for 30-day unplanned hospital readmissions after hip and knee arthroplasty.
  • It evaluates multiple window lengths (from the day of surgery up to three years prior) and finds that unstructured clinical notes perform best with a much shorter window, especially three to six months before surgery.
  • For structured encounter data, predictive performance improves as the time window grows but plateaus after about twelve months.
  • The authors compare both structured and unstructured inputs using traditional non-neural text encoders and several neural encoders, showing the temporal patterns are stable across model types and encoder choices.
  • The paper argues against the assumption that “more history is always better” and provides modality-specific time-window guidance to optimize readmission prediction models.

Abstract

With the proliferation of Electronic Health Records (EHRs), a critical challenge in building predictive models is determining the optimal historical data time window to maximize accuracy. This study investigates the impact of various observation windows ranging from the day of surgery to three years prior on predicting 30-day readmission following hip and knee arthroplasties. The dataset encompasses both structured encounter records (over 4 million) and unstructured clinical notes (80,000) from 7,174 patients. To extract meaning from the clinical notes, we employed a suite of non neural (BOW, count BOW, TF IDF, LDA) and neural encoders (BERT, 1D CNN, BiLSTM, Average). We subsequently evaluated models utilizing clinical notes alone, structured data alone, and a combination of both modalities. Our results demonstrate that the optimal time window for unstructured clinical notes is significantly shorter than for structured data, maximum predictive performance was achieved using notes from just three to six months prior to surgery. In contrast, performance using structured data improved as the time window lengthened, but strictly plateaued after twelve months. These modality-specific temporal patterns remained consistent regardless of model complexity or encoder type. Ultimately, these findings challenge the general assumption that more historical data inherently yields better machine learning predictions, establishing targeted time-window guidelines for optimizing readmission prediction models.