How Much LLM Does a Self-Revising Agent Actually Need?

arXiv cs.AI / 4/10/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates how much of an LLM-based self-revising agent’s competence comes from the LLM itself versus explicit external structure such as world modeling, planning, and reflection.
  • It proposes a declared reflective runtime protocol that externalizes agent state, confidence signals, guarded actions, and hypothetical transitions into an inspectable structure.
  • Using noisy Collaborative Battleship experiments across progressively structured agents, the authors decompose performance into four components: posterior belief tracking, explicit world-model planning, symbolic in-episode reflection, and sparse LLM-based revision.
  • Explicit world-model planning produces the largest gains, improving win rate by +24.1 percentage points over a greedy posterior-following baseline and boosting F1 by +0.017.
  • Conditional LLM revision applied at about 4.3% of turns yields only a small, non-monotonic effect (F1 +0.005 while win rate drops), suggesting sparse revisions may not be reliably net-positive and emphasizing the value of the proposed evaluation methodology.

Abstract

Recent LLM-based agents often place world modeling, planning, and reflection inside a single language model loop. This can produce capable behavior, but it makes a basic scientific question difficult to answer: which part of the agent's competence actually comes from the LLM, and which part comes from explicit structure around it? We study this question not by claiming a general answer, but by making it empirically tractable. We introduce a declared reflective runtime protocol that externalizes agent state, confidence signals, guarded actions, and hypothetical transitions into inspectable runtime structure. We instantiate this protocol in a declarative runtime and evaluate it on noisy Collaborative Battleship [4] using four progressively structured agents over 54 games (18 boards \times 3 seeds). The resulting decomposition isolates four components: posterior belief tracking, explicit world-model planning, symbolic in-episode reflection, and sparse LLM-based revision. Across this decomposition, explicit world-model planning improves substantially over a greedy posterior-following baseline (+24.1pp win rate, +0.017 F1). Symbolic reflection operates as a real runtime mechanism -- with prediction tracking, confidence gating, and guarded revision actions -- even though its current revision presets are not yet net-positive in aggregate. Adding conditional LLM revision at about 4.3\% of turns yields only a small and non-monotonic change: average F1 rises slightly (+0.005) while win rate drops (31\rightarrow29 out of 54). These results suggest a methodological contribution rather than a leaderboard claim: externalizing reflection turns otherwise latent agent behavior into inspectable runtime structure, allowing the marginal role of LLM intervention to be studied directly.