Task Vector Geometry Underlies Dual Modes of Task Inference in Transformers
arXiv stat.ML / 5/6/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper investigates how transformers infer latent tasks from context using two modes: recognizing previously trained tasks and adapting to novel ones.
- Building on prior interpretability work that finds “task vectors” in middle-layer representations, the study provides a more rigorous theoretical link between internal task-vector geometry and external behavior.
- Using small transformers trained from scratch on synthetic latent-task sequence distributions, the authors derive a mathematical characterization of how training affects task-vector geometry.
- They report that in-distribution inference is driven by Bayesian task retrieval via convex combinations of learned task vectors, while out-of-distribution generalization comes from extrapolative task learning using representations that lie in a subspace nearly orthogonal to the task-vector space.
- Overall, the results connect training distributions, task-vector geometry, and OOD generalization in a single unified framework, explaining why dual inference modes can coexist in one model.
Related Articles

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide
Dev.to

AI as Your Contingency Co-Pilot: Automating Wedding Day 'What-Ifs'
Dev.to

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss
MarkTechPost
When Claude Hallucinates in Court: The Latham & Watkins Incident and What It Means for Attorney Liability
MarkTechPost
Solidity LM surpasses Opus
Reddit r/LocalLLaMA