Adaptive Parallel Monte Carlo Tree Search for Efficient Test-time Compute Scaling

arXiv cs.AI / 4/2/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes “negative early exit” for Monte Carlo Tree Search to prune unproductive trajectories and address long-tail latency caused by variable MCTS execution times.
  • It also introduces an adaptive boosting mechanism that reallocates reclaimed computation to concurrent searches to reduce resource contention.
  • The authors integrate these methods into vLLM and report substantially lower p99 end-to-end latency while improving throughput.
  • The approach is designed to maintain reasoning accuracy even as test-time compute scaling behavior becomes more efficient and predictable.

Abstract

Monte Carlo Tree Search (MCTS) is an effective test-time compute scaling (TTCS) method for improving the reasoning performance of large language models, but its highly variable execution time leads to severe long-tail latency in practice. Existing optimizations such as positive early exit, reduce latency in favorable cases but are less effective when search continues without meaningful progress. We introduce {\it negative early exit}, which prunes unproductive MCTS trajectories, and an {\it adaptive boosting mechanism} that reallocates reclaimed computation to reduce resource contention among concurrent searches. Integrated into vLLM, these techniques substantially reduce p99 end-to-end latency while improving throughput and maintaining reasoning accuracy.