Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents

arXiv cs.CL / 4/10/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • 提案手法TrACEは、推論時の計算量を毎ステップの難易度に応じて適応配分するトレーニング不要のコントローラで、候補次状態(次アクション)のロールアウト間での行動一致度を手がかりにします。
  • 各ステップで少数の候補アクションをサンプリングし、一致(高いagreement)が得られれば即決し、不一致(低いagreement)なら上限まで追加ロールアウトして多数決で確定します。
  • 学習コンポーネント、外部検証器、人手ラベルを使わずに、モデル自身の出力一貫性がステップの成功可否や難易度情報をエンコードしているという仮説を利用する点が特徴です。
  • Qwen 2.5 3B Instruct(CPU)での評価では、GSM8KとMiniHouseの両方で自己一貫性(self-consistency)の固定予算方式と同等または近い精度を、LLM呼び出し回数を大幅に削減しながら達成しました。
  • 生成AIエージェントの「ステップごとの適応計算量制御」を、学習なしで多ステップ逐次決定タスクまで評価した最初期の方向性として位置づけられています。

Abstract

Inference-time compute scaling has emerged as a powerful technique for improving the reliability of large language model (LLM) agents, but existing methods apply compute uniformly: every decision step receives the same budget regardless of its difficulty. We introduce TrACE (Trajectorical Adaptive Compute via agrEement), a training-free controller that allocates LLM calls adaptively across agent timesteps by measuring inter-rollout action agreement. At each step, TrACE samples a small set of candidate next actions and measures how consistently the model commits to the same action. High agreement signals an easy decision; the controller commits immediately. Low agreement signals uncertainty; the controller samples additional rollouts up to a configurable cap before committing to the plurality action. No learned components, no external verifier, and no human labels are required. We evaluate TrACE against greedy decoding and fixed-budget self-consistency (SC-4, SC-8) on two benchmarks spanning single-step reasoning (GSM8K, n=50) and multi-step household navigation (MiniHouse, n=30), using a Qwen 2.5 3B Instruct model running on CPU. TrACE-4 matches SC-4 accuracy while using 33% fewer LLM calls on GSM8K and 39% fewer on MiniHouse. TrACE-8 matches SC-8 accuracy with 55% fewer calls on GSM8K and 65% fewer on MiniHouse. We further show that inter-rollout agreement is a reliable signal of step-level success, validating the core hypothesis that the model's own output consistency encodes difficulty information that can be exploited without training. TrACE is the first training-free, per-timestep adaptive-compute controller for LLM agents to be evaluated on multi-step sequential decision tasks.