Coding Agents are Effective Long-Context Processors

arXiv cs.CL / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that current long-context performance in LLMs degrades because latent, uninterpretable attention is not an effective mechanism for processing long documents.
  • It proposes externalizing long-context processing into explicit, executable interactions by using coding agents that organize text in file systems and manipulate it with native tools.
  • Evaluations on long-context reasoning, retrieval-augmented generation, and open-domain QA over corpora up to three trillion tokens show coding agents outperform prior published state of the art by an average of 17.3% across benchmarks.
  • The authors attribute the gains to coding agents’ native tool proficiency (using executable code/terminal commands) and file-system familiarity (treating massive corpora as directory structures).
  • The findings suggest long-context capabilities can be improved without relying solely on semantic search or context-window scaling, motivating new long-context processing directions for LLM systems.

Abstract

Large Language Models (LLMs) have demonstrated remarkable progress in scaling to access massive contexts. However, the access is via the latent and uninterpretable attention mechanisms, and LLMs fail to effective process long context, exhibiting significant performance degradation as context length increases. In this work, we study whether long-context processing can be externalized from latent attention into explicit, executable interactions, by allowing coding agents to organize text in file systems and manipulate it using its native tools. We evaluate off-the-shelf frontier coding agents as the general interface for tasks that require processing long contexts, including long-context reasoning, retrieval-augmented generation, and open-domain question answering with large-scale corpus contains up to three trillion tokens. Across multiple benchmarks, these agents outperform published state-of-the-art by 17.3% on average. We attribute this efficacy to two key factors: native tool proficiency, which enables agents to leverage executable code and terminal commands rather than passive semantic queries, and file system familiarity, which allows them to navigate massive text corpora as directory structures. These findings suggest that delegating long-context processing to coding agents offers an effective alternative to semantic search or context window scaling, opening new directions for long-context processing in LLMs.