How political censorship actually works inside Qwen, DeepSeek, GLM, and Yi: Ablation and behavioral results across 9 models

Reddit r/LocalLLaMA / 3/24/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

A new arXiv paper analyzes how political censorship is implemented inside multiple Chinese-origin LLMs, showing that “refusal rates” can be misleading because some newer Qwen models stop refusing and instead always respond with strongly steered CCP framing.
For Qwen3-8B, removing the model’s political-sensitivity direction causes substantial confabulation (e.g., swapping historical events), implying that the censorship mechanism is entangled with factual knowledge representations in that model.
In contrast, similar ablation on GLM, DeepSeek, and Phi yields accurate factual outputs with no wrong-event confabulations, suggesting different internal architectures for handling political sensitivity.
For Yi-1.5-9B, probes detect political content at multiple layers but the model neither refuses nor steers, indicating concept detection and behavioral “routing” are learned independently and may be separated in training.
Cross-model tests show the relevant “political direction” is not universal (e.g., Qwen3-8B’s direction does not transfer meaningfully to GLM-4-9B), and a larger 46-model screening finds strong CCP-specific discrimination only in a small subset, highlighting fragility of small-sample conclusions.

New paper studying the internal mechanisms of political censorship in Chinese-origin LLMs: https://arxiv.org/abs/2603.18280

Findings relevant to this community:

On Qwen/Alibaba - the generational shift: Across Qwen2.5-7B → Qwen3-8B → Qwen3.5-4B → Qwen3.5-9B, hard refusal went from 6.2% to 25% to 0% to 0%. But steering (CCP narrative framing) rose from 4.33/5 to 5.00/5 over the same period. The newest Qwen models don't refuse - they answer everything in maximally steered language. Any evaluation that counts refusals would conclude Qwen3.5 is less censored. It isn't.

On Qwen3-8B - the confabulation problem: When you surgically remove the political-sensitivity direction, Qwen3-8B doesn't give factual answers. It substitutes Pearl Harbor for Tiananmen and Waterloo for the Hundred Flowers campaign. 72% confabulation rate. Its architecture entangles factual knowledge with the censorship mechanism. Safety-direction ablation on the same model produces 0% wrong events, so it's specific to how Qwen encoded political concepts.

On GLM, DeepSeek, Phi - clean ablation: Same procedure on these three models produces accurate factual output. Zero wrong-event confabulations. Remove the censorship direction and the model simply answers the question.

On Yi - detection without routing: Yi-1.5-9B detects political content at every layer (probes work) but never refuses (0% English, 6.2% Chinese) and shows no steering. It recognized the sensitivity and did nothing with it. Post-training never installed a routing policy for political content. This is direct evidence that concept detection and behavioral routing are independently learned.

On cross-model transfer: Qwen3-8B's political direction applied to GLM-4-9B: cosine 0.004. Completely meaningless. Different labs built completely different geometry. There's no universal "uncensor" direction.

On the 46-model screen: Only 4 models showed strong CCP-specific discrimination at n=32 prompts (Baidu ERNIE, Qwen3-8B, Amazon Nova, Meituan). All Western frontier models: zero. An initial n=8 screen was misleading - Moonshot Kimi-K2 dropped from +88pp to +9pp, DeepSeek v3-0324 from +75pp to -3pp, MiniMax from +61pp to 0pp. Small-sample behavioral claims are fragile.

Paper: https://arxiv.org/abs/2603.18280

Happy to answer questions.

submitted by /u/Logical-Employ-9692
[link] [comments]

Engenharia de Prompt: Por Que a Forma Como Você Pergunta Muda Tudo(Um guia introdutório)

Dev.to

The Obligor

Dev.to

The Markup

Dev.to

2026 年 AI 部落格變現完整攻略：從第一篇文章到月收入 $1000

Dev.to

OpenSeeker's open-source approach aims to break up the data monopoly for AI search agents

THE DECODER

How political censorship actually works inside Qwen, DeepSeek, GLM, and Yi: Ablation and behavioral results across 9 models

Key Points

Related Articles

Engenharia de Prompt: Por Que a Forma Como Você Pergunta Muda Tudo(Um guia introdutório)

The Obligor

The Markup

2026 年 AI 部落格變現完整攻略：從第一篇文章到月收入 $1000

OpenSeeker's open-source approach aims to break up the data monopoly for AI search agents

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer