CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution
arXiv cs.CL / 4/20/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that reinforcement learning for LLM agents often relies on static data distributions, which do not adapt to changing agent behaviors and can miss complex interaction coverage.
- It introduces CoEvolve, a closed-loop “agent–data mutual evolution” framework that uses rollout feedback signals (e.g., forgetting and uncertainty) to detect failure-prone interaction patterns.
- CoEvolve turns those detected patterns into LLM-based synthesized tasks, validates them via environment interactions, and then uses the results to update the training data distribution.
- Experiments on AppWorld and BFCL with Qwen2.5-7B, Qwen3-4B, and Qwen3-30B-A3B show consistent, significant improvements over strong baseline models, with absolute gains of 19.43%, 15.58%, and 18.14%.
- Overall, the approach demonstrates joint adaptation of both the agent policy and the data it learns from, aiming to better match evolving environment dynamics.
Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)
Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI
Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else
Dev.to
Local LLM Beginner’s Guide (Mac - Apple Silicon)
Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals
Dev.to