Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks
arXiv cs.AI / 4/25/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces COSPLAY, a co-evolution framework designed for long-horizon interactive tasks where agents must chain skills over many timesteps under delayed rewards and partial observability.
- COSPLAY uses an LLM decision agent that retrieves structured skills from a learnable skill bank to improve consistent decision making across episodes.
- A separate “skill pipeline” agent discovers and refines reusable skills from unlabeled rollouts, continuously updating the skill bank and the associated contracts.
- Experiments on six game environments show that using an 8B base model, COSPLAY delivers over 25.1% average reward improvement versus four frontier LLM baselines on single-player benchmarks, while staying competitive in multi-player social reasoning games.
Related Articles
Navigating WooCommerce AI Integrations: Lessons for Agencies & Developers from a Bluehost Conflict
Dev.to

One Day in Shenzhen, Seen Through an AI's Eyes
Dev.to

Underwhelming or underrated? DeepSeek V4 shows “impressive” gains
SCMP Tech

Claude Code: Hooks, Subagents, and Skills — Complete Guide
Dev.to

Finding the Gold: An AI Framework for Highlight Detection
Dev.to