SIEVE: Sample-Efficient Parametric Learning from Natural Language

arXiv cs.LG / 4/6/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • SIEVE is a new approach for sample-efficient parametric learning that adapts language models using natural-language context while updating model weights rather than relying only on prompts.
  • The method uses SIEVE-GEN, a synthetic data generation pipeline that decomposes context to generate higher-quality rollouts by pairing synthetic queries with only the relevant parts of context.
  • SIEVE then applies context distillation to internalize the (decomposed) context into the model, aiming to reduce the amount of query examples needed for learning.
  • In evaluations on reasoning tasks where context is essential—such as custom domains, RuleArena, and Machine Translation from One Book—SIEVE achieves better performance than prior context distillation methods with as few as three query examples.

Abstract

Natural language context-such as instructions, knowledge, or feedback-contains rich signal for adapting language models. While in-context learning provides adaptation via the prompt, parametric learning persists into model weights and can improve performance further, though is data hungry and heavily relies on either high-quality traces or automated verifiers. We propose SIEVE, a method for sample-efficient parametric learning from natural language context that requires as few as three query examples. SIEVE uses a novel synthetic data generation pipeline, SIEVE-GEN, that leverages the insight that context is decomposable. Decomposing context allows us to generate higher quality rollouts by pairing synthetic queries with only the applicable context rather than the entirety, then using context distillation to internalize context into the model. We evaluate in reasoning settings where context is necessary, including custom domains and the RuleArena and Machine Translation from One Book tasks. Our results show that SIEVE outperforms prior context distillation methods using just three query examples, demonstrating how to achieve sample-efficient parametric learning from natural language.