RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval

arXiv cs.CL / 4/13/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

RecaLLM is a set of reasoning language models designed to better leverage long-context inputs by explicitly coupling retrieval and reasoning.
The paper identifies a “lost-in-thought” bottleneck where reasoning that boosts performance also makes later in-context retrieval harder, even over short reasoning spans.
RecaLLM addresses this by interleaving reasoning with explicit in-context retrieval, alternating between generating intermediate steps and fetching evidence for subproblems.
It uses a negligible-overhead constrained decoding method that enables verbatim copying of evidence spans to improve grounding for subsequent generation.
Experiments on open-source LLMs show RecaLLM achieves strong results on RULER and HELMET, with consistent gains up to 128K tokens despite training on much shorter (≤10K token) samples.

Abstract

We propose RecaLLM, a set of reasoning language models post-trained to make effective use of long-context information. In-context retrieval, which identifies relevant evidence from context, and reasoning are deeply intertwined: retrieval supports reasoning, while reasoning often determines what must be retrieved. However, their interaction remains largely underexplored. In preliminary experiments on several open-source LLMs, we observe that in-context retrieval performance substantially degrades even after a short reasoning span, revealing a key bottleneck for test-time scaling that we refer to as lost-in-thought: reasoning steps that improve performance also make subsequent in-context retrieval more challenging. To address this limitation, RecaLLM interleaves reasoning with explicit in-context retrieval, alternating between reasoning and retrieving context information needed to solve intermediate subproblems. We introduce a negligible-overhead constrained decoding mechanism that enables verbatim copying of evidence spans, improving the grounding of subsequent generation. Trained on diverse lexical and semantic retrieval tasks, RecaLLM achieves strong performance on two long-context benchmarks, RULER and HELMET, significantly outperforming baselines. Notably, we observe consistent gains at context windows of up to 128K tokens using training samples of at most 10K tokens, far shorter than those used by existing long-context approaches, highlighting a promising path toward improving long-context performance without expensive long-context training data.

Black Hat Asia

AI Business

I built the missing piece of the MCP ecosystem

Dev.to

When Agents Go Wrong: AI Accountability and the Payment Audit Trail

Dev.to

Google Gemma 4 Review 2026: The Open Model That Runs Locally and Beats Closed APIs

Dev.to

OpenClaw Deep Dive Guide: Self-Host Your Own AI Agent on Any VPS (2026)

Dev.to

RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval

Key Points

Abstract

Related Articles

Black Hat Asia

I built the missing piece of the MCP ecosystem

When Agents Go Wrong: AI Accountability and the Payment Audit Trail

Google Gemma 4 Review 2026: The Open Model That Runs Locally and Beats Closed APIs

OpenClaw Deep Dive Guide: Self-Host Your Own AI Agent on Any VPS (2026)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer