From Context to Skills: Can Language Models Learn from Context Skillfully?

arXiv cs.AI / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Many real-world language-model tasks require learning and reasoning over long, complex contexts that go beyond the model’s fixed parametric knowledge, motivating “context learning.”
The paper proposes Ctx2Skill, which performs inference-time skill augmentation by autonomously discovering, refining, and selecting context-specific natural-language skills without human supervision or external feedback.
Ctx2Skill uses a multi-agent self-play loop (Challenger/Reasoner with a neutral Judge) plus Proposer/Generator components that analyze failures and turn them into targeted skill updates for both sides.
To maintain robustness and avoid adversarial collapse or over-specialization, it introduces a Cross-time Replay mechanism that selects skill sets providing the best balance across representative cases.
Experiments on four CL-bench context-learning tasks show that the learned skills can be plugged into multiple backbone language models and consistently improve solving rates.

Abstract

Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge from the given context. An intuitive solution is inference-time skill augmentation: extracting the rules and procedures from context into natural-language skills. However, constructing such skills for context learning scenarios faces two challenges: the prohibitive cost of manual skill annotation for long, technically dense contexts, and the lack of external feedback for automated skill construction, since there is no automatic signal to tell whether a proposed skill is helpful. In this paper, we propose Ctx2Skill, a self-evolving framework that autonomously discovers, refines, and selects context-specific skills without human supervision or external feedback. At its core, a multi-agent self-play loop has a Challenger that generates probing tasks and rubrics, a Reasoner that attempts to solve them guided by an evolving skill set, and a neutral Judge that provides binary feedback. Crucially, both the Challenger and the Reasoner evolve through accumulated skills: dedicated Proposer and Generator agents analyze failure cases and synthesize them into targeted skill updates for both sides, enabling automated skill discovery and refinement. To prevent adversarial collapse caused by increasingly extreme task generation and over-specialized skill accumulation, we further introduce a Cross-time Replay mechanism that identifies the skill set achieving the best balance across representative cases for the Reasoner side, ensuring robust and generalizable skill evolution. The resulting skills can be plugged into any language model to obtain better context learning capability. Evaluated on four context learning tasks from CL-bench, Ctx2Skill consistently improves solving rates across backbone models.