CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution

arXiv cs.CL / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that reinforcement learning for LLM agents often relies on static data distributions, which do not adapt to changing agent behaviors and can miss complex interaction coverage.
It introduces CoEvolve, a closed-loop “agent–data mutual evolution” framework that uses rollout feedback signals (e.g., forgetting and uncertainty) to detect failure-prone interaction patterns.
CoEvolve turns those detected patterns into LLM-based synthesized tasks, validates them via environment interactions, and then uses the results to update the training data distribution.
Experiments on AppWorld and BFCL with Qwen2.5-7B, Qwen3-4B, and Qwen3-30B-A3B show consistent, significant improvements over strong baseline models, with absolute gains of 19.43%, 15.58%, and 18.14%.
Overall, the approach demonstrates joint adaptation of both the agent policy and the data it learns from, aiming to better match evolving environment dynamics.

Abstract

Reinforcement learning for LLM agents is typically conducted on a static data distribution, which fails to adapt to the agent's evolving behavior and leads to poor coverage of complex environment interactions. To address these challenges, we propose CoEvolve, an agent-data mutual evolution framework that enables LLM agents to improve through closed-loop, interaction-driven training. Specifically, CoEvolve extracts feedback signals such as forgetting and uncertainty from rollout trajectories to identify failure-prone interaction patterns, and utilizes them to guide LLM-based task synthesis. The synthesized tasks are validated through environment interaction and utilized to update the data distribution, enabling joint adaptation of the agent and its data. Extensive experiments on AppWorld and BFCL across Qwen2.5-7B, Qwen3-4B, and Qwen3-30B-A3B demonstrate consistent and significant improvements over strong base models, yielding absolute gains of 19.43%, 15.58%, and 18.14%, respectively.

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Dev.to

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

Dev.to

CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution

Key Points

Abstract

Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else

Local LLM Beginner’s Guide (Mac - Apple Silicon)

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer