Context-Aware Hospitalization Forecasting Evaluations for Decision Support using LLMs

arXiv cs.AI / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study examines how large language models can be used for context-aware forecasting of hospitalizations to support real-time healthcare resource decisions during major disruptions.
  • It compares three methods across 60 U.S. counties—direct LLM forecasting, classical time-series (ARX) models, and a context-augmented hybrid approach called HybridARX.
  • The evaluation emphasizes decision relevance by measuring not only standard forecasting accuracy metrics, but also bias and lead–lag alignment, reflecting operational needs beyond error minimization.
  • Results show HybridARX delivers more stable and better-calibrated forecasts than classical ARX, especially when contextual inputs are noisy.
  • The paper concludes that LLMs are most effective for non-stationary healthcare resource forecasting when integrated into structured hybrid modeling pipelines rather than used standalone.

Abstract

Medical and public health experts must make real-time resource decisions, such as expanding hospital bed capacity, based on projected hospitalization trends during large-scale healthcare disruptions (e.g., operational failures or pandemics). Forecasting models can assist in this task by analyzing large volumes of resource-related data at the facility level, but they must be reliable for decision-making under real-world data conditions. Recent work shows that large language models (LLMs) can incorporate richer forms of context into numerical forecasting. Whereas traditional models rely primarily on temporal context (i.e., past observations), LLMs can also leverage non-temporal public health context such as demographic, geographic, and population-level features. However, it remains unclear how these models should be used to produce stable or decision-relevant predictions in real-world healthcare settings. To evaluate how LLMs can be effectively used in this setting, we evaluate three approaches across 60 counties with low-,mid-, and high-hospitalization intensities in the United States: direct LLM-based forecasting, classical time-series models, and a context-augmented hybrid pipeline (HybridARX) that incorporates LLM-derived signals into structured models. Because the goal is operational decision-making rather than error minimization alone, we evaluate performance with bias and lead-lag alignment in addition to standard forecasting metrics. Our results show that HybridARX improves over classical ARX by yielding more stable and better-calibrated forecasts, particularly when incorporating noisy contextual signals into structured time-series models. These findings suggest that, in non-stationary healthcare resource forecasting, LLMs are most useful when embedded within structured hybrid models.