Transformers Can Learn Rules They've Never Seen: Proof of Computation Beyond Interpolation
arXiv cs.LG / 3/19/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- It tests whether transformers can infer rules absent from training data, challenging interpolation-only accounts in two controlled experiments.
- Experiment 1 uses a cellular automaton with an XOR rule and held-out input patterns to show that similarity-based predictors fail, yet a two-layer transformer learns the rule and circuit extraction identifies XOR computation, with multi-step constraint propagation being key.
- Experiment 2 studies symbolic operator chains over integers with one operator pair held out, requiring intermediate-step proofs; across all 49 holdout pairs, the transformer surpasses every interpolation baseline and degrades without intermediate-step supervision.
- The work also demonstrates a standard transformer block can implement exact local Boolean rules, providing an existence proof that transformers can learn and communicate unseen rule structures, while leaving open when such behavior arises in large-scale training.
Related Articles
How political censorship actually works inside Qwen, DeepSeek, GLM, and Yi: Ablation and behavioral results across 9 models
Reddit r/LocalLLaMA
Engenharia de Prompt: Por Que a Forma Como Você Pergunta Muda Tudo(Um guia introdutório)
Dev.to
The Obligor
Dev.to
The Markup
Dev.to
2026 年 AI 部落格變現完整攻略:從第一篇文章到月收入 $1000
Dev.to