Decocted Experience Improves Test-Time Inference in LLM Agents

arXiv cs.AI / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses how to improve LLM agent performance without updating model parameters, focusing on test-time inference enhancements that reduce wasted computation and suboptimal exploration.
  • It proposes using input context as a complementary scaling axis alongside test-time compute, arguing that the quality of context construction is crucial for guiding agent reasoning.
  • The authors introduce and analyze “decocted experience,” a mechanism that extracts the essence of past experience, organizes it coherently, and retrieves salient parts to build better prompts for reasoning and agentic behavior.
  • The work systematically studies experience-augmented agents, including how performance scales with accumulated experience, what characterizes effective context, and which data structures support context construction.
  • Experiments validate the approach across math reasoning, web browsing, and software engineering tasks, showing that decocted experience improves test-time inference outcomes for LLM agents.

Abstract

There is growing interest in improving LLMs without updating model parameters. One well-established direction is test-time scaling, where increased inference-time computation (e.g., longer reasoning, sampling, or search) is used to improve performance. However, for complex reasoning and agentic tasks, naively scaling test-time compute can substantially increase cost and still lead to wasted budget on suboptimal exploration. In this paper, we explore \emph{context} as a complementary scaling axis for improving LLM performance, and systematically study how to construct better inputs that guide reasoning through \emph{experience}. We show that effective context construction critically depends on \emph{decocted experience}. We present a detailed analysis of experience-augmented agents, studying how to derive context from experience, how performance scales with accumulated experience, what characterizes good context, and which data structures best support context construction. We identify \emph{decocted experience} as a key mechanism for effective context construction: extracting essence from experience, organizing it coherently, and retrieving salient information to build effective context. We validate our findings across reasoning and agentic tasks, including math reasoning, web browsing, and software engineering.