In-Context Learning Under Regime Change

arXiv cs.LG / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper studies non-stationary settings where the data-generating process changes at unknown times, requiring models to detect shifts and adapt online.
  • It formulates in-context change-point detection for transformer-based foundation models and proves that transformer architectures can solve the problem.
  • The authors show that the required model complexity (depth and parameter count) varies with how much the model knows about the change-point timing, ranging from no knowledge to exact timing.
  • Experiments on synthetic linear regression and linear dynamical systems confirm that trained transformers can match optimal baselines under different information assumptions.
  • By encoding change-point knowledge, the approach improves real-world performance of pretrained models on infectious disease forecasting and financial volatility forecasting around FOMC announcements without retraining.

Abstract

Non-stationary sequences arise naturally in control, forecasting, and decision-making. The data-generating process shifts at unknown times, and models must detect the change, discard or downweight obsolete evidence, and adapt to new dynamics on the fly. Transformer-based foundation models increasingly rely on in-context learning for time series forecasting, tabular prediction, and continuous control. As these models are deployed in non-stationary environments, understanding their ability to detect and adapt to regime shifts is important. We formalize this as an in-context change-point detection problem and formally establish the existence of transformer models that solve this problem. Our construction demonstrates that model complexity, in layers and parameters, depends on the level of information available about the change-point location, from no knowledge to knowing exact timing. We validate our results with experiments on synthetic linear regression and linear dynamical systems, where trained transformers match the performance of optimal baselines across information levels. We also show that encoding and incorporating changepoint knowledge indeed improves the real-world performance of a pretrained foundation models on infectious disease forecasting and on financial volatility forecasting around Federal Open Market Committee (FOMC) announcements without retraining, demonstrating practical applicability to real-world regime changes.