Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size

arXiv cs.CL / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper shows that as language models scale up, they become better at resisting counterfactual misinformation while also becoming worse at ignoring irrelevant tokens in context.
It introduces the first scaling laws for “contextual entrainment,” a behavior where models preferentially use tokens that appear in the provided context regardless of whether they are relevant.
Using Cerebras-GPT (111M–13B) and Pythia (410M–12B), the authors find contextual entrainment scales as a predictable power law, but trends differ by context type: semantic contexts show decreasing entrainment with larger models, while non-semantic contexts show increasing entrainment.
The results quantify a divergence: the largest models are about 4× more resistant to counterfactual misinformation yet about 2× more prone to copying arbitrary tokens.
The study concludes that “semantic filtering” and “mechanical copying” behave as distinct mechanisms that scale in opposite directions, meaning scaling alone does not eliminate context sensitivity.

Abstract

Larger language models become simultaneously better and worse at handling contextual information -- better at ignoring false claims, worse at ignoring irrelevant tokens. We formalize this apparent paradox through the first scaling laws for contextual entrainment, the tendency of models to favor tokens that appeared in context regardless of relevance. Analyzing the Cerebras-GPT (111M-13B) and Pythia (410M-12B) model families, we find entrainment follows predictable power-law scaling, but with opposite trends depending on context type: semantic contexts show decreasing entrainment with scale, while non-semantic contexts show increasing entrainment. Concretely, the largest models are four times more resistant to counterfactual misinformation than the smallest, yet simultaneously twice as prone to copying arbitrary tokens. These diverging trends, which replicate across model families, suggest that semantic filtering and mechanical copying are functionally distinct behaviors that scale in opposition -- scaling alone does not resolve context sensitivity, it reshapes it.

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"

Dev.to

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"

Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Dev.to

"The Hidden Costs of AI Agent Deployment: A CFO's Guide to True ROI in Enterpris

Dev.to

"The Real Cost of AI Compute: Why Token Efficiency Separates Viable Agents from

Dev.to

Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size

Key Points

Abstract

Related Articles

"The AI Agent's Guide to Sustainable Income: From Zero to Profitability"

"The Hidden Economics of AI Agents: Survival Strategies in Competitive Markets"

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

"The Hidden Costs of AI Agent Deployment: A CFO's Guide to True ROI in Enterpris

"The Real Cost of AI Compute: Why Token Efficiency Separates Viable Agents from

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer