AI Navigate

How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing

arXiv cs.CL / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • It introduces forced-completion probing to compare identical queries with correct and incorrect single-token continuations across all layers of four decoder-only models (1.5B-13B parameters).
  • It shows that correct and incorrect paths diverge via rotation on an approximate hypersphere, with displacement magnitudes staying similar while angular separation grows across layers.
  • It finds that models actively suppress the correct answer when faced with incorrect input, moving probability away from the right token rather than passively failing.
  • It observes a parameter threshold around 1.6B where these effects emerge, indicating a phase-transition in factual processing capability.

Abstract

When a language model is fed a wrong answer, what happens inside the network? Current understanding treats truthfulness as a static property of individual-layer representations-a direction to be probed, a feature to be extracted. Less is known about the dynamics: how internal representations diverge across the full depth of the network when the model processes correct versus incorrect continuations. We introduce forced-completion probing, a method that presents identical queries with known correct and incorrect single-token continuations and tracks five geometric measurements across every layer of four decoder-only models(1.5B-13B parameters). We report three findings. First, correct and incorrect paths diverge through rotation, not rescaling: displacement vectors maintain near-identical magnitudes while their angular separation increases, meaning factual selection is encoded in direction on an approximate hypersphere. Second, the model does not passively fail on incorrect input-it actively suppresses the correct answer, driving internal probability away from the right token. Third, both phenomena are entirely absent below a parameter threshold and emerge at 1.6B, suggesting a phase transition in factual processing capability. These results show that factual constraint processing has a specific geometric character-rotational, not scalar; active, not passive-that is invisible to methods based on single-layer probes or magnitude comparisons.