Efficient Multi-Agent System Training with Data Influence-Oriented Tree Search
arXiv cs.CL / 4/27/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that Monte Carlo Tree Search (MCTS)-style synthetic data generation for LLM-based multi-agent systems can be inefficient when it selects data based only on Q-values.
- It introduces Data Influence-oriented Tree Search (DITS), which uses data influence scores to guide both the tree search process and which synthetic data to select for training.
- The authors develop methods to estimate influence scores for non-differentiable metrics while lowering computation cost by reusing inference-time computations.
- Experiments across eight multi-agent datasets show DITS is robust and effective, and that spending more inference budget on influence-score estimation (not Q-values) improves training efficiency and performance.
Related Articles

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools
Dev.to

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared
Dev.to

Legal Insight Transformation: A Beginner's Guide to Modern Research
Dev.to
I tested the same prompt across multiple AI models… the differences surprised me
Reddit r/artificial

The five loops between AI coding and AI engineering
Dev.to