Temporal Dependencies in In-Context Learning: The Role of Induction Heads
arXiv cs.CL / 4/3/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies how LLMs perform in-context learning by showing that several open-source models exhibit a serial-recall-like bias, assigning highest probability to tokens that follow a repeated token in the input sequence (+1 lag behavior).
- Through ablation experiments, it identifies “induction heads”—attention heads that attend to the token after a previous occurrence of the current token—as a key mechanistic driver of this temporal dependence pattern.
- Removing attention heads with high induction scores substantially reduces the +1 lag bias, while ablating randomly selected heads does not produce the same effect.
- The study further finds that high-induction-head ablation more strongly degrades few-shot prompted serial-recall performance than random-head ablation.
- Overall, the results provide a mechanistically specific link between induction heads and ordered temporal context retrieval in transformer-based in-context learning.
Related Articles

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story
Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure
Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts
MarkTechPost

The house asked me a question
Dev.to

Precision Clip Selection: How AI Suggests Your In and Out Points
Dev.to