Training In-Context and In-Weights Mixtures Via Contrastive Context Sampling

arXiv cs.LG / 4/3/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies training methods that jointly develop in-context learning (ICL) and in-weights learning (IWL), aiming to switch between them depending on how relevant the provided context is.
It argues that standard fine-tuning can erode ICL, and that prior work shows emergence of ICL after IC-Train depends on factors like task diversity and training duration.
The authors find that context selection is critical: random contexts weaken both ICL and IWL, while using only highly similar examples can cause ICL to collapse into label-copying that ignores relevance.
They propose “Contrastive-Context” sampling, which mixes similar and random examples within a context and varies similarity grades across contexts to learn stable ICL–IWL mixtures.
Extensive experiments on four LLMs across multiple tasks, supported by diagnostic probing and a theoretical minimal-model analysis, show the contrastive setup avoids collapse into purely ICL, purely IWL, or copying behavior.

Abstract

We investigate training strategies that co-develop in-context learning (ICL) and in-weights learning (IWL), and the ability to switch between them based on context relevance. Although current LLMs exhibit both modes, standard task-specific fine-tuning often erodes ICL, motivating IC-Train - fine-tuning with in-context examples. Prior work has shown that emergence of ICL after IC-Train depends on factors such as task diversity and training duration. In this paper we show that the similarity structure between target inputs and context examples also plays an important role. Random context leads to loss of ICL and IWL dominance, while only similar examples in context causes ICL to degenerate to copying labels without regard to relevance. To address this, we propose a simple Contrastive-Context which enforces two types of contrasts: (1) mix of similar and random examples within a context to evolve a correct form of ICL, and (2) varying grades of similarity across contexts to evolve ICL-IWL mixtures. We present insights on the importance of such contrast with theoretical analysis of a minimal model. We validate with extensive empirical evaluation on four LLMs and several tasks. Diagnostic probes confirm that contrasted contexts yield stable ICL-IWL mixtures, avoiding collapse into pure ICL, IWL, or copying.