AI Navigate

Transformers Learn Robust In-Context Regression under Distributional Uncertainty

arXiv cs.LG / 3/20/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The authors study in-context learning for noisy linear regression under distributional uncertainty, relaxing assumptions like i.i.d. data and Gaussian noise.
  • Transformers are shown to match or outperform classical ML baselines across a broad range of shifts, including non-Gaussian coefficients, heavy-tailed noise, and non-i.i.d. prompts.
  • The results demonstrate robust in-context adaptation for regression tasks beyond traditional estimators, expanding the practical applicability of in-context learning.
  • The work compares Transformer performance to ML baselines optimized for corresponding maximum-likelihood criteria, highlighting practical gains over conventional estimators.

Abstract

Recent work has shown that Transformers can perform in-context learning for linear regression under restrictive assumptions, including i.i.d. data, Gaussian noise, and Gaussian regression coefficients. However, real-world data often violate these assumptions: the distributions of inputs, noise, and coefficients are typically unknown, non-Gaussian, and may exhibit dependency across the prompt. This raises a fundamental question: can Transformers learn effectively in-context under realistic distributional uncertainty? We study in-context learning for noisy linear regression under a broad range of distributional shifts, including non-Gaussian coefficients, heavy-tailed noise, and non-i.i.d. prompts. We compare Transformers against classical baselines that are optimal or suboptimal under the corresponding maximum-likelihood criteria. Across all settings, Transformers consistently match or outperform these baselines, demonstrating robust in-context adaptation beyond classical estimators.