Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents

arXiv cs.AI / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that tool-augmented LLM agents implemented with reactive execution repeatedly recompute reasoning after each observation, leading to higher latency and compounding error sensitivity.
  • It proposes Profile--Then--Reason (PTR), where an LLM first creates an explicit workflow, deterministic/guarded operators execute it, a verifier checks the resulting trace, and repair is triggered only if the workflow becomes unreliable.
  • PTR is formalized as a bounded pipeline (profile, routing, execution, verification, repair, reasoning) with a constrained number of LLM calls—two in the nominal case and three in the worst case under bounded repair.
  • Experiments on six benchmarks using four language models show PTR outperforms a ReAct baseline in 16 of 24 configurations, with gains especially strong on retrieval-heavy and decomposition-heavy tasks.
  • The study concludes that reactive execution can still be preferable when high performance requires substantial online adaptation beyond the initially planned workflow.

Abstract

Large language model agents that use external tools are often implemented through reactive execution, in which reasoning is repeatedly recomputed after each observation, increasing latency and sensitivity to error propagation. This work introduces Profile--Then--Reason (PTR), a bounded execution framework for structured tool-augmented reasoning, in which a language model first synthesizes an explicit workflow, deterministic or guarded operators execute that workflow, a verifier evaluates the resulting trace, and repair is invoked only when the original workflow is no longer reliable. A mathematical formulation is developed in which the full pipeline is expressed as a composition of profile, routing, execution, verification, repair, and reasoning operators; under bounded repair, the number of language-model calls is restricted to two in the nominal case and three in the worst case. Experiments against a ReAct baseline on six benchmarks and four language models show that PTR achieves the pairwise exact-match advantage in 16 of 24 configurations. The results indicate that PTR is particularly effective on retrieval-centered and decomposition-heavy tasks, whereas reactive execution remains preferable when success depends on substantial online adaptation.