What Is the Minimum Architecture for Prolepsis? Early Irrevocable Commitment Across Tasks in Small Transformers

arXiv cs.LG / 4/17/2026

💬 OpinionModels & Research

Key Points

  • The paper introduces “prolepsis,” a transformer behavior where the model commits early to a decision and later layers cannot correct it because specific task-related attention heads sustain the commitment.
  • Replicating earlier planning-site findings on open models (Gemma 2 2B and Llama 3.2 1B), the authors find that planning signals are invisible to several residual-stream analysis methods and that further statistical tools (CLTs) are necessary.
  • They identify attention-head mechanisms that route the committed decision to the output, addressing a gap where attribution graphs previously failed to reveal the responsible routing.
  • The study separates requirements for search versus commitment, finding that search can work with ≤16 layers, while commitment needs more layers.
  • For factual recall, the authors observe a related recurring motif at a different depth, with no overlap between planning heads that recur and the top-10 factual recall heads, suggesting architectural modularity.

Abstract

When do transformers commit to a decision, and what prevents them from correcting it? We introduce \textbf{prolepsis}: a transformer commits early, task-specific attention heads sustain the commitment, and no layer corrects it. Replicating \citeauthor{lindsey2025biology}'s (\citeyear{lindsey2025biology}) planning-site finding on open models (Gemma~2 2B, Llama~3.2 1B), we ask five questions. (Q1)~Planning is invisible to six residual-stream methods; CLTs are necessary. (Q2)~The planning-site spike replicates with identical geometry. (Q3)~Specific attention heads route the decision to the output, filling a gap flagged as invisible to attribution graphs. (Q4)~Search requires {\leq}16 layers; commitment requires more. (Q5)~Factual recall shows the same motif at a different network depth, with zero overlap between recurring planning heads and the factual top-10. Prolepsis is architectural: the template is shared, the routing substrates differ. All experiments run on a single consumer GPU (16\,GB VRAM).