TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

arXiv cs.AI / 4/16/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces TREX, a multi-agent system designed to automate the full lifecycle of LLM fine-tuning, from requirement analysis through training and evaluation.
  • TREX coordinates a “Researcher” and an “Executor” to conduct literature/data research, devise training strategies, generate data recipes, and run model training experiments.
  • It represents multi-round experimentation as a search tree, allowing the system to plan exploration paths, reuse prior results, and extract higher-level insights from iterative trials.
  • To assess automated training quality, the authors build FT-Bench with 10 real-world scenario-derived fine-tuning tasks covering both general capability improvements and domain-specific performance gains.
  • Reported experiments indicate TREX can consistently improve model performance on the benchmark’s target tasks via its automated workflow.

Abstract

While Large Language Models (LLMs) have empowered AI research agents to perform isolated scientific tasks, automating complex, real-world workflows, such as LLM training, remains a significant challenge. In this paper, we introduce TREX, a multi-agent system that automates the entire LLM training life-cycle. By orchestrating collaboration between two core modules-the Researcher and the Executor-the system seamlessly performs requirement analysis, open-domain literature and data research, formulation of training strategies, preparation of data recipes, and model training and evaluation. The multi-round experimental process is modeled as a search tree, enabling the system to efficiently plan exploration paths, reuse historical results, and distill high-level insights from iterative trials. To evaluate the capability of automated LLM training, we construct FT-Bench, a benchmark comprising 10 tasks derived from real-world scenarios, ranging from optimizing fundamental model capabilities to enhancing performance on domain-specific tasks. Experimental results demonstrate that the TREX agent consistently optimizes model performance on target tasks.