SGA-MCTS: Decoupling Planning from Execution via Training-Free Atomic Experience Retrieval

arXiv cs.AI / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

SGA-MCTS reframes LLM multi-step planning as a non-parametric retrieval problem to avoid the trade-off between slow inference-time search and limited generalization from supervised fine-tuning.
The method uses offline Monte Carlo Tree Search to generate high-fidelity solution trajectories, then distills them into reusable State-Goal-Action (SGA) atoms that are de-lexicalized to abstract away domain-specific details.
At inference time, a retrieval-augmented hybrid symbolic-semantic agent fetches relevant SGAs and re-grounds them into the current context as soft reasoning hints, improving planning without heavy online search.
Experiments on complex benchmarks report that frozen, open-weights models using SGA-MCTS can reach performance comparable to state-of-the-art systems (e.g., GPT-5) without task-specific fine-tuning.
By amortizing expensive search costs offline, SGA-MCTS aims to deliver “System 2” reasoning depth at “System 1” inference speed, making real-time autonomous planning more scalable.

Abstract

LLM-powered systems require complex multi-step decision-making abilities to solve real-world tasks, yet current planning approaches face a trade-off between the high latency of inference-time search and the limited generalization of supervised fine-tuning. To address this limitation, we introduce \textbf{SGA-MCTS}, a framework that casts LLM planning as non-parametric retrieval. Offline, we leverage Monte Carlo Tree Search (MCTS) to explore the solution space and distill high-fidelity trajectories into State-Goal-Action (SGA) atoms. These atoms are de-lexicalized primitives that abstract concrete entities into symbolic slots, preserving reusable causal logic while discarding domain-specific noise. Online, a retrieval-augmented agent employs a hybrid symbolic-semantic mechanism to fetch relevant SGAs and re-ground them into the current context as soft reasoning hints. Empirical results on complex benchmarks demonstrate that this paradigm enables frozen, open-weights models to match the performance of SOTA systems (e.g., GPT-5) without task-specific fine-tuning. By effectively amortizing the heavy computational cost of search, SGA-MCTS achieves System 2 reasoning depth at System 1 inference speeds, rendering autonomous planning both scalable and real-time feasible.