Skills-Coach: A Self-Evolving Skill Optimizer via Training-Free GRPO

arXiv cs.CL / 5/1/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Skills-Coach is a new automated framework aimed at improving how LLM-based agents self-evolve and expand their competencies, addressing fragmentation across existing skill sets.
  • The framework combines four modules—diverse task generation, lightweight prompt/code optimization, comparative execution and evaluation, and traceable criteria-based evaluation—to systematically test and upgrade skills.
  • It uses “training-free GRPO” to optimize skills without additional training, with both virtual and real execution modes to support different deployment/testing setups.
  • The approach is validated with Skill-X, a benchmark dataset covering 48 diverse skills, where experiments show broad and significant improvements in skill capability across categories.
  • Overall, Skills-Coach is positioned as a step toward more robust, adaptable LLM agents with comprehensive skill coverage for intelligent applications.

Abstract

We introduce Skills-Coach, a novel automated framework designed to significantly enhance the self-evolution of skills within Large Language Model (LLM)-based agents. Addressing the current fragmentation of the skill ecosystem, Skills-Coach explores the boundaries of skill capabilities, thereby facilitating the comprehensive competency coverage essential for intelligent applications. The framework comprises four core modules: a Diverse Task Generation Module that systematically creates a comprehensive test suite for various skills; a Lightweight Optimization Module dedicated to optimizing skill prompts and their corresponding code; a Comparative Execution Module facilitating the execution and evaluation of both original and optimized skills; and a Traceable Evaluation Module, which rigorously evaluates performance against specified criteria. Skills-Coach offers flexible execution options through its virtual and real modes. To validate its efficacy, we introduce Skill-X, a comprehensive benchmark dataset consisting of 48 diverse skills. Experimental results demonstrate that Skills-Coach achieves significant performance improvements in skill capability across a wide range of categories, highlighting its potential to advance the development of more robust and adaptable LLM-based agents.