WebXSkill: Skill Learning for Autonomous Web Agents

arXiv cs.AI / 4/16/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

WebXSkill addresses a “grounding gap” in autonomous LLM web agents by converting web workflow skills into executable, parameterized action programs paired with step-level natural-language guidance for understanding and recovery.
The framework extracts reusable action subsequences from synthetic agent trajectories, organizes the resulting skills in a URL-based graph for context-aware retrieval, and then deploys them in both fully automated “grounded mode” and agent-assisted “guided mode.”
Experiments on WebArena and WebVoyager show improved task success rates, with gains up to +9.8 points and +12.9 points over baseline methods.
The accompanying code is released publicly, enabling others to build on and evaluate the executable-skill approach for long-horizon browser tasks.

Abstract

Autonomous web agents powered by large language models (LLMs) have shown promise in completing complex browser tasks, yet they still struggle with long-horizon workflows. A key bottleneck is the grounding gap in existing skill formulations: textual workflow skills provide natural language guidance but cannot be directly executed, while code-based skills are executable but opaque to the agent, offering no step-level understanding for error recovery or adaptation. We introduce WebXSkill, a framework that bridges this gap with executable skills, each pairing a parameterized action program with step-level natural language guidance, enabling both direct execution and agent-driven adaptation. WebXSkill operates in three stages: skill extraction mines reusable action subsequences from readily available synthetic agent trajectories and abstracts them into parameterized skills, skill organization indexes skills into a URL-based graph for context-aware retrieval, and skill deployment exposes two complementary modes, grounded mode for fully automated multi-step execution and guided mode where skills serve as step-by-step instructions that the agent follows with its native planning. On WebArena and WebVoyager, WebXSkill improves task success rate by up to 9.8 and 12.9 points over the baseline, respectively, demonstrating the effectiveness of executable skills for web agents. The code is publicly available at https://github.com/aiming-lab/WebXSkill.