EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience Learning

arXiv cs.AI / 4/14/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper proposes a unified “hybrid policy learning” view of MCP-GUI agents, teaching the agent how to choose between structured API calls (MCP) and GUI interaction when each is most effective.
It argues that distillation and experience augmentation address different failure modes, and therefore the best mechanism selection should be application-aware.
The authors introduce EE-MCP, a self-evolving framework with a fully automatic pipeline for environment generation/validation, trajectory collection, gap-driven task synthesis, and quality-filtered training without manual intervention.
A core component is an “experience bank” that stores LLM-learned rules from trajectory comparisons, enabling improvement at inference time without additional fine-tuning.
Cross-application experiments on three desktop apps show strategy-dependent gains: distillation performs much better on MCP-dominant tasks (77.8% pass rate, +17.8pp), while the experience bank is superior on GUI-intensive tasks (+10.0pp).

Abstract

Computer-use agents that combine GUI interaction with structured API calls via the Model Context Protocol (MCP) show promise for automating software tasks. However, existing approaches lack a principled understanding of how agents should balance these two modalities and how to enable iterative self-improvement across diverse applications. We formulate MCP-GUI interplay as a unified hybrid policy learning problem where the agent learns when each modality provides complementary advantages, and show that distillation and experience augmentation target fundamentally different failure modes - requiring application-aware mechanism selection. Built on this formulation, we propose a self-evolving framework with a fully automatic pipeline that orchestrates automatic environment generation and validation, trajectory collection, gap-driven task synthesis, and quality-filtered training - all without manual intervention. A key innovation is our experience bank, which accumulates LLM-learned rules from trajectory comparison, enabling inference-time improvement without fine-tuning. Systematic \textbf{cross-application analysis} across three desktop applications reveals that the optimal strategy depends on MCP-GUI composition: distillation achieves 77.8\% pass rate on MCP-dominant tasks (+17.8pp), while the experience bank excels on GUI-intensive tasks (+10.0pp).