SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

arXiv cs.CL / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • SkillClaw は、LLMエージェントが使う再利用可能なスキルが、導入後にほぼ固定されたままで改善されにくい点を問題提起しています。
  • 複数ユーザー間および時間経過にわたる行動軌跡(成功・失敗のパターン)を主信号として集約し、自律的な evolver が繰り返し見られる振る舞いを抽出してスキル更新へ変換する枠組みを提案しています。
  • 更新は既存スキルの洗練または新能力の追加として共有リポジトリに反映され、ユーザー間で同期されることで、ある文脈で見つかった改善がシステム全体へ伝播します。
  • WildClawBench で、少量の相互作用とフィードバックでも Qwen3-Max の実環境エージェント性能が大きく向上することを実験で示しています。

Abstract

Large language model (LLM) agents such as OpenClaw rely on reusable skills to perform complex tasks, yet these skills remain largely static after deployment. As a result, similar workflows, tool usage patterns, and failure modes are repeatedly rediscovered across users, preventing the system from improving with experience. While interactions from different users provide complementary signals about when a skill works or fails, existing systems lack a mechanism to convert such heterogeneous experiences into reliable skill updates. To address these issues, we present SkillClaw, a framework for collective skill evolution in multi-user agent ecosystems, which treats cross-user and over-time interactions as the primary signal for improving skills. SkillClaw continuously aggregates trajectories generated during use and processes them with an autonomous evolver, which identifies recurring behavioral patterns and translates them into updates to the skill set by refining existing skills or extending them with new capabilities. The resulting skills are maintained in a shared repository and synchronized across users, allowing improvements discovered in one context to propagate system-wide while requiring no additional effort from users. By integrating multi-user experience into ongoing skill updates, SkillClaw enables cross-user knowledge transfer and cumulative capability improvement, and experiments on WildClawBench show that limited interaction and feedback, it significantly improves the performance of Qwen3-Max in real-world agent scenarios.