EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience Learning

arXiv cs.AI / 4/14/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a unified “hybrid policy learning” view of MCP-GUI agents, teaching the agent how to choose between structured API calls (MCP) and GUI interaction when each is most effective.
  • It argues that distillation and experience augmentation address different failure modes, and therefore the best mechanism selection should be application-aware.
  • The authors introduce EE-MCP, a self-evolving framework with a fully automatic pipeline for environment generation/validation, trajectory collection, gap-driven task synthesis, and quality-filtered training without manual intervention.
  • A core component is an “experience bank” that stores LLM-learned rules from trajectory comparisons, enabling improvement at inference time without additional fine-tuning.
  • Cross-application experiments on three desktop apps show strategy-dependent gains: distillation performs much better on MCP-dominant tasks (77.8% pass rate, +17.8pp), while the experience bank is superior on GUI-intensive tasks (+10.0pp).

Abstract

Computer-use agents that combine GUI interaction with structured API calls via the Model Context Protocol (MCP) show promise for automating software tasks. However, existing approaches lack a principled understanding of how agents should balance these two modalities and how to enable iterative self-improvement across diverse applications. We formulate MCP-GUI interplay as a unified hybrid policy learning problem where the agent learns when each modality provides complementary advantages, and show that distillation and experience augmentation target fundamentally different failure modes - requiring application-aware mechanism selection. Built on this formulation, we propose a self-evolving framework with a fully automatic pipeline that orchestrates automatic environment generation and validation, trajectory collection, gap-driven task synthesis, and quality-filtered training - all without manual intervention. A key innovation is our experience bank, which accumulates LLM-learned rules from trajectory comparison, enabling inference-time improvement without fine-tuning. Systematic \textbf{cross-application analysis} across three desktop applications reveals that the optimal strategy depends on MCP-GUI composition: distillation achieves 77.8\% pass rate on MCP-dominant tasks (+17.8pp), while the experience bank excels on GUI-intensive tasks (+10.0pp).