Learning to Adapt: In-Context Learning Beyond Stationarity
arXiv cs.LG / 4/14/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper investigates how transformer in-context learning (ICL) behaves when task relationships are non-stationary, i.e., the underlying input-output mapping changes over time.
- It provides a theoretical analysis for non-stationary regression settings and models evolution using a first-order autoregressive process.
- The authors argue that gated linear attention (GLA) adaptively adjusts how much past inputs influence predictions, effectively learning a recency bias.
- They show (theoretically and empirically) that GLA can achieve lower training and testing errors than standard linear attention in these dynamic ICL tasks.
- Experiments validate the usefulness of gating mechanisms for ICL under shifting data-generating processes, bridging an assumption gap in prior stationary-focused analyses.




