Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information

arXiv cs.CL / 3/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces the Compression-Consistency Principle, arguing that next-token prediction favors hypotheses that yield shorter, internally consistent descriptions of the training data.
It argues that truth bias in language models is not an intrinsic drive toward truth but arises when false alternatives are harder to compress structurally.
In experiments with GPT-2–style models on synthetic data, correct completions reach 83.1% accuracy at balanced data and 67.0% when correct rules constitute only 10% of the corpus.
Replacing random errors with a coherent but incorrect rule system largely eliminates the preference for correctness, driving accuracy toward chance, with a weaker but present effect in more natural-language-like synthetic settings (57.7%).
The authors show that embedding verification steps can restore the preference for correctness at small scales and that more consistent rules yield graded accuracy improvements, suggesting that the observed truth bias is a byproduct of compression pressure rather than an intrinsic truth-seeking drive.

Abstract

Why do language models sometimes prefer correct statements even when trained on mixed-quality data? We introduce the Compression--Consistency Principle: next-token prediction favors hypotheses that allow shorter and more internally consistent descriptions of the training data. Truth bias emerges only when false alternatives are structurally harder to compress. We test this using small GPT-2-style character-level transformers (3.5M--86M parameters) on synthetic math corpora with controlled mixtures of correct and incorrect rules. In the random-error setting, models strongly prefer correct completions in paired evaluation: 83.1% accuracy at balanced data and 67.0% even when correct rules appear in only 10% of the corpus. Replacing random errors with a coherent but mathematically incorrect rule system largely eliminates the preference (near-chance accuracy). In a more natural-language-like synthetic world, the effect is weaker but still present (57.7%). Additional experiments show that embedding verification steps can restore preference for correctness even at small scale, while increasing the number of consistent rules produces a graded improvement in accuracy. Our results suggest that what appears as a "truth bias" is largely a side effect of compression pressure and preference for internal consistency, rather than an intrinsic drive toward truth. Full code and data are available at https://github.com/Rai220/compression-drives-truth.

Is AI becoming a bubble, and could it end like the dot-com crash?

Reddit r/artificial

Externalizing State

Dev.to

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

Dev.to

My AI Does Not Have a Clock

Dev.to

How to settle on a coding LLM ? What parameters to watch out for ?

Reddit r/LocalLLaMA

Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information

Key Points

Abstract

Related Articles

Is AI becoming a bubble, and could it end like the dot-com crash?

Externalizing State

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

My AI Does Not Have a Clock

How to settle on a coding LLM ? What parameters to watch out for ?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer