How Do Answer Tokens Read Reasoning Traces? Self-Reading Patterns in Thinking LLMs for Quantitative Reasoning

arXiv cs.CL / 4/22/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper investigates how thinking LLMs use “answer tokens” to read and integrate intermediate reasoning traces, focusing specifically on quantitative reasoning.
Attention analysis shows correct answers follow a benign self-reading pattern (a forward drift along the reasoning trace plus sustained focus on semantic anchor points), while incorrect answers show diffuse, irregular attention.
The authors interpret this behavior as internal certainty during decoding, where the model commits to a plausible solution branch and incorporates key evidence.
They introduce a training-free steering approach using Self-Reading Quality (SRQ) scores that combine geometric metrics (process control) and semantic metrics (content monitoring) to favor reliable inference.
Experiments report consistent accuracy improvements from SRQ-driven steering compared with approaches that do not explicitly promote this self-reading quality.

Abstract

Thinking LLMs produce reasoning traces before answering. Prior activation steering work mainly targets on shaping these traces. It remains less understood how answer tokens actually read and integrate the reasoning to produce reliable outcomes. Focusing on quantitative reasoning, we analyze the answer-to-reasoning attention and observe a benign self-reading pattern aligned with correctness, characterized by a forward drift of the reading focus along the reasoning trace and a persistent concentration on key semantic anchors, whereas incorrect solutions exhibit diffuse and irregular attention pattern. We interpret this as internal certainty during answer decoding, where the model commits to a viable solution branch and integrates key evidence. Following this, we propose a training-free steering method driven by Self-Reading Quality (SRQ) scores combining geometric metrics for process control with semantic metrics for content monitoring. SRQ selects data to build steering vectors that guide inference toward benign self-reading and away from uncertain and disorganized reading. Experiments show that our method yields consistent accuracy gains.