Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis

arXiv cs.CL / 4/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies in-context learning (ICL) by combining attention-head mechanistic analysis with a holistic decomposition into Task Recognition (TR) and Task Learning (TL).
It introduces Task Subspace Logit Attribution (TSLA) to identify which attention heads specialize in TR versus TL, and shows these heads independently and effectively represent the corresponding ICL components.
Correlation, ablation, and input-perturbation experiments provide evidence that TR and TL heads play distinct yet complementary roles in performing ICL.
Steering experiments using geometric analysis of hidden states suggest TR heads align hidden representations with a task subspace, while TL heads rotate those representations within the subspace toward the correct label.
The authors argue their framework unifies earlier ICL mechanism findings (e.g., induction heads and task vectors) with the TR–TL attention-head perspective for a more interpretable account.

Abstract

We investigate the mechanistic underpinnings of in-context learning (ICL) in large language models by reconciling two dominant perspectives: the component-level analysis of attention heads and the holistic decomposition of ICL into Task Recognition (TR) and Task Learning (TL). We propose a novel framework based on Task Subspace Logit Attribution (TSLA) to identify attention heads specialized in TR and TL, and demonstrate their distinct yet complementary roles. Through correlation analysis, ablation studies, and input perturbations, we show that the identified TR and TL heads independently and effectively capture the TR and TL components of ICL. Using steering experiments with geometric analysis of hidden states, we reveal that TR heads promote task recognition by aligning hidden states with the task subspace, while TL heads rotate hidden states within the subspace toward the correct label to facilitate prediction. We further show how previous findings on ICL mechanisms, including induction heads and task vectors, can be reconciled with our attention-head-level analysis of the TR-TL decomposition. Our framework thus provides a unified and interpretable account of how large language models execute ICL across diverse tasks and settings.