I ported Anthropic's official skill-creator from Claude Code to OpenCode — now you can create and evaluate AI agent skills with any model

Reddit r/LocalLLaMA / 4/11/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical Usage

Key Points

  • A developer open-sourced an “eval-driven” AI agent skill creator that ports Anthropic’s official Claude Code skill-creator to OpenCode using TypeScript.
  • The tool supports guided skill creation via an intake interview, automatically generates eval sets (should-trigger/should-not-trigger prompts), and measures trigger accuracy by comparing runs with and without the skill.
  • It iteratively optimizes skill descriptions using an LLM loop with a train/test split (up to five iterations), and provides an HTML viewer plus variance/benchmark reporting for human review.
  • Because it is designed to work with OpenCode, it can evaluate and develop skills using any of OpenCode’s 300+ models, including locally hosted models.
  • Installation is offered via an npm one-command workflow, with the project released under an Apache 2.0 license and attributed to Anthropic’s original approach.
I ported Anthropic's official skill-creator from Claude Code to OpenCode — now you can create and evaluate AI agent skills with any model

Hey r/LocalLLaMA — I open-sourced a tool that brings eval-driven development to AI agent skills. It's based on Anthropic's official skill-creator for Claude Code, but rewritten in TypeScript to work with OpenCode (which supports 300+ models including local ones).

The problem: creating skills for AI agents is trial-and-error. You write a skill, test it manually, and hope it triggers on the right prompts. There's no systematic way to measure if a skill works.

What this does:

  • Guided skill creation with an intake interview
  • Auto-generates eval test sets (should-trigger and should-not-trigger queries)
  • Runs evals with and without the skill to measure trigger accuracy
  • Optimizes skill descriptions through an iterative LLM loop (60/40 train/test split, up to 5 iterations)
  • Visual HTML eval viewer for human review
  • Benchmarks with variance analysis across iterations

The most interesting part for this community: it works with any of OpenCode's supported models. If you're running local models through OpenCode, you can use this tool with them.

One-command install:

npx opencode-skill-creator install --global 

Apache 2.0 license. Based on Anthropic's skill-creator with attribution.

GitHub: https://github.com/antongulin/opencode-skill-creator

npm: https://www.npmjs.com/package/opencode-skill-creator

Happy to answer questions about the eval methodology, local model support, or architecture.

submitted by /u/antonusaca
[link] [comments]