Demonstrations, CoT, and Prompting: A Theoretical Analysis of ICL

arXiv cs.LG / 3/23/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper provides a theoretical analysis of In-Context Learning (ICL) under mild assumptions, linking demonstration design, Chain-of-Thought prompting, the number of demonstrations, and prompt templates to generalization.
It derives an upper bound on the ICL test loss, showing that performance depends on the quality of demonstrations (quantified via Lipschitz properties), the model's intrinsic ICL capability, and the degree of distribution shift.
It analyzes Chain-of-Thought prompting as a form of task decomposition, beneficial when demonstrations are well-chosen for each substep and the subtasks are easier to learn.
It discusses how ICL's sensitivity to prompt templates varies with the number of demonstrations and provides experiments that corroborate the theoretical insights.

Abstract

In-Context Learning (ICL) enables pretrained LLMs to adapt to downstream tasks by conditioning on a small set of input-output demonstrations, without any parameter updates. Although there have been many theoretical efforts to explain how ICL works, most either rely on strong architectural or data assumptions, or fail to capture the impact of key practical factors such as demonstration selection, Chain-of-Thought (CoT) prompting, the number of demonstrations, and prompt templates. We address this gap by establishing a theoretical analysis of ICL under mild assumptions that links these design choices to generalization behavior. We derive an upper bound on the ICL test loss, showing that performance is governed by (i) the quality of selected demonstrations, quantified by Lipschitz constants of the ICL loss along paths connecting test prompts to pretraining samples, (ii) an intrinsic ICL capability of the pretrained model, and (iii) the degree of distribution shift. Within the same framework, we analyze CoT prompting as inducing a task decomposition and show that it is beneficial when demonstrations are well chosen at each substep and the resulting subtasks are easier to learn. Finally, we characterize how ICL performance sensitivity to prompt templates varies with the number of demonstrations. Together, our study shows that pretraining equips the model with the ability to generalize beyond observed tasks, while CoT enables the model to compose simpler subtasks into more complex ones, and demonstrations and instructions enable it to retrieve similar or complex tasks, including those that can be composed into more complex ones, jointly supporting generalization to unseen tasks. All theoretical insights are corroborated by experiments.

CRM Development That Drives Growth

Dev.to

Karpathy's Autoresearch: Improving Agentic Coding Skills

Dev.to

How to Write AI Prompts That Actually Work

Dev.to

[D] Any other PhD students feel underprepared and that the bar is too low?

Reddit r/MachineLearning

Automating the Perfect Pitch: An AI Framework for Boutique PR

Dev.to

Demonstrations, CoT, and Prompting: A Theoretical Analysis of ICL

Key Points

Abstract

Related Articles

CRM Development That Drives Growth

Karpathy's Autoresearch: Improving Agentic Coding Skills

How to Write AI Prompts That Actually Work

[D] Any other PhD students feel underprepared and that the bar is too low?

Automating the Perfect Pitch: An AI Framework for Boutique PR

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer