Emergent Inference-Time Semantic Contamination via In-Context Priming

arXiv cs.CL / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that LLMs can exhibit “inference-time semantic contamination” where injecting certain few-shot examples causes measurable distributional shifts in later, semantically unrelated prompts.
It revisits prior claims that k-shot prompting alone is insufficient, showing the effect can occur but depends on model capability, with richer models exhibiting stronger drift.
In a controlled setup using five culturally loaded numbers as demonstrations, the authors observe shifts toward darker, authoritarian, and stigmatized themes for capable models, while a simpler/smaller model shows no significant effect.
The study also finds that even structurally inert demonstrations (nonsense strings) can perturb output distributions, suggesting two mechanisms: structural-format contamination and semantic-content contamination.
The authors map boundary conditions for when contamination occurs and highlight direct security implications for LLM applications that rely on few-shot prompting.

Abstract

Recent work has shown that fine-tuning large language models (LLMs) on insecure code or culturally loaded numeric codes can induce emergent misalignment, causing models to produce harmful content in unrelated downstream tasks. The authors of that work concluded that

k

-shot prompting alone does not induce this effect. We revisit this conclusion and show that inference-time semantic drift is real and measurable; however, it requires models of large-enough capability. Using a controlled experiment in which five culturally loaded numbers are injected as few-shot demonstrations before a semantically unrelated prompt, we find that models with richer cultural-associative representations exhibit significant distributional shifts toward darker, authoritarian, and stigmatized themes, while a simpler/smaller model does not. We additionally find that structurally inert demonstrations (nonsense strings) perturb output distributions, suggesting two separable mechanisms: structural format contamination and semantic content contamination. Our results map the boundary conditions under which inference-time contamination occurs, and carry direct implications for the security of LLM-based applications that use few-shot prompting.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/7DailyView insight →

Black Hat Asia

AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled

Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?

Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)

Dev.to

Emergent Inference-Time Semantic Contamination via In-Context Priming

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat Asia

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Why Most Productivity Systems Fail (And What to Do Instead)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer