ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering

arXiv cs.CL / 5/4/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that prompt-based LLM agents for autonomous ML are limited because small models cannot generalize well from execution trajectories and large proprietary models are too computationally expensive to scale widely.
  • It proposes “learning-based agentic ML,” where an LLM agent improves by interacting with ML tasks and learning via online reinforcement learning rather than relying purely on prompts.
  • The authors introduce a training framework with three components: exploration-enriched fine-tuning to diversify actions, step-wise RL to train from single action steps for faster data collection, and an ML-specific reward module to unify different ML feedback into RL rewards.
  • Using this framework, they train “ML-Agent” with a 7B Qwen-2.5–size LLM and show that, across only 9 ML tasks, it reaches performance comparable to agents using much larger proprietary models while using substantially less compute and demonstrating cross-task generalization.
  • Overall, the work suggests a more efficient and accessible path for building autonomous ML agents by combining LLM agents with interactive RL and task-specific reward shaping.

Abstract

The emergence of large language model (LLM)-based agents has significantly advanced the development of autonomous machine learning (ML) engineering. However, the dominant prompt-based paradigm exhibits limitations: smaller models lack the capacity to learn from execution trajectories for generalization, while large proprietary models incur high computational overhead, restricting accessibility and scalability. Focusing on this, for the first time, we explore the paradigm of learning-based agentic ML, where an LLM agent learns through interactive experimentation on ML tasks using online reinforcement learning (RL). To realize this, we propose a novel agentic ML training framework with three key components: (1) exploration-enriched fine-tuning, which enables LLM agents to generate diverse actions for enhanced RL exploration; (2) step-wise RL, which enables training on a single action step, accelerating experience collection and improving training efficiency; (3) an agentic ML-specific reward module, which unifies varied ML feedback signals into consistent rewards for RL optimization. Leveraging this framework, we train ML-Agent, driven by a 7B-sized Qwen-2.5 LLM for autonomous ML. Despite training on only 9 ML tasks, our 7B-sized ML-Agent achieves comparable performance to agents using much larger proprietary LLMs (e.g., GPT-5) but at significantly lower computational cost, demonstrating strong performance and cross-task generalization.