The Library Theorem: How External Organization Governs Agentic Reasoning Capacity

arXiv cs.AI / 2026/3/24

💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research

要点

  • The paper studies “externalized” agent reasoning by treating the transformer context window as an I/O page and analyzing how structured retrieval via indexed external memory scales versus sequential scanning.
  • It proves that tool-augmented agents with reasoning-state indexing can reduce retrieval costs from linear reads ($\Omega(N)$) to logarithmic reads ($O(\log_b N)$), with cumulative costs over $T$ reasoning steps improving from quadratic to near-linear in the indexed case.
  • Controlled benchmarks across multiple content types and store sizes show the predicted gains, including an abstract-content regime where indexed agents read a near-constant number of pages (about 1) regardless of store size.
  • The experiments reveal two failure modes: without proper indexing, models struggle to maintain efficient search at scale, and on familiar/encyclopedic content, models may bypass the retrieval protocol and answer from parametric memory, causing excessive token usage even when the index works.
  • The authors argue for a separation of concerns: use language models to construct the index (leveraging semantic understanding) while using deterministic algorithms to traverse it (avoiding protocol-skipping caused by model understanding).

Abstract

Externalized reasoning is already exploited by transformer-based agents through chain-of-thought, but structured retrieval -- indexing over one's own reasoning state -- remains underexplored. We formalize the transformer context window as an I/O page and prove that tool-augmented agents with indexed external memory achieve exponentially lower retrieval cost than agents restricted to sequential scanning: O(\log_b N) versus \Omega(N) page reads per query, and O(T \log_b T) versus \Theta(T^2) cumulative cost over T reasoning steps -- a gap that widens as deliberation deepens. We test these predictions on a controlled lookup benchmark across three content types -- random hashes, ordered integers, and encyclopedia entries -- varying store size from 50 to 5,000 items, and replicate key conditions across two model generations (GPT-4o-mini and GPT-5.4). On abstract content, the indexed agent achieves median 1 page read regardless of store size, confirming the O(1) prediction. Sorted pages without an index fail to close the gap: the weaker model cannot sustain binary search at scale, and the stronger model achieves near-optimal \log_2 N search but still loses to the index by 5\times. On familiar content (encyclopedia entries), a competing failure mode emerges: the model recognizes the domain, bypasses the retrieval protocol, and generates answers from parametric memory, producing catastrophic token expenditure even when the index is sound. This parametric memory competition dissociates the two cognitive operations that indexing combines: understanding content (where language models excel) and following navigational protocols (where they fail when understanding tempts them to shortcut). The result argues for a separation of concerns: use language models for index construction, where semantic understanding helps, and deterministic algorithms for index traversal, where it hurts.

The Library Theorem: How External Organization Governs Agentic Reasoning Capacity | AI Navigate