Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search

arXiv cs.CL / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that Monte Carlo Tree Search (MCTS)-style synthetic data generation for LLM-based multi-agent systems can be inefficient when it selects data based only on Q-values.
It introduces Data Influence-oriented Tree Search (DITS), which uses data influence scores to guide both the tree search process and which synthetic data to select for training.
The authors develop methods to estimate influence scores for non-differentiable metrics while lowering computation cost by reusing inference-time computations.
Experiments across eight multi-agent datasets show DITS is robust and effective, and that spending more inference budget on influence-score estimation (not Q-values) improves training efficiency and performance.

Abstract

Monte Carlo Tree Search (MCTS) based methods provide promising approaches for generating synthetic data to enhance the self-training of Large Language Model (LLM) based multi-agent systems (MAS). These methods leverage Q-values to estimate individual agent contributions. However, relying solely on Q-values to identify informative data may misalign with the data synthesis objective, as the focus should be on selecting data that best enhances model training. To address this discrepancy, we propose Data Influence-oriented Tree Search (DITS), a novel framework that incorporates influence scores to guide both tree search and data selection. By leveraging influence scores, we effectively identify the most impactful data for system improvement, thereby enhancing model performance. Furthermore, we derive influence score estimation methods tailored for non-differentiable metrics, significantly reducing computational overhead by utilizing inference computations. Extensive experiments on eight multi-agent datasets demonstrate the robustness and effectiveness of the proposed methods. Notably, our findings reveal that allocating more inference resources to estimate influence scores, rather than Q-values, during data synthesis can more effectively and efficiently enhance model training.

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools

Dev.to

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared

Dev.to

Legal Insight Transformation: A Beginner's Guide to Modern Research

Dev.to

I tested the same prompt across multiple AI models… the differences surprised me

Reddit r/artificial

The five loops between AI coding and AI engineering

Dev.to

Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search

Key Points

Abstract

Related Articles

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared

Legal Insight Transformation: A Beginner's Guide to Modern Research

I tested the same prompt across multiple AI models… the differences surprised me

The five loops between AI coding and AI engineering

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer