Learning Linear Regression with Low-Rank Tasks in-Context

arXiv stat.ML / 4/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates how in-context learning (ICL) works when multiple real-world tasks share a common underlying structure, using a linear attention model trained on low-rank regression problems.
  • It derives an exact characterization of the prediction distribution and the generalization error in the high-dimensional limit.
  • The authors show that randomness from finite pre-training data creates an implicit regularization effect.
  • They identify a sharp phase transition in generalization error that is controlled by the structure of the tasks, offering a theoretical framework for how transformers learn to learn task structure.

Abstract

In-context learning (ICL) is a key building block of modern large language models, yet its theoretical mechanisms remain poorly understood. It is particularly mysterious how ICL operates in real-world applications where tasks have a common structure. In this work, we address this problem by analyzing a linear attention model trained on low-rank regression tasks. Within this setting, we precisely characterize the distribution of predictions and the generalization error in the high-dimensional limit. Moreover, we find that statistical fluctuations in finite pre-training data induce an implicit regularization. Finally, we identify a sharp phase transition of the generalization error governed by task structure. These results provide a framework for understanding how transformers learn to learn the task structure.