Barriers to Universal Reasoning With Transformers (And How to Overcome Them)

arXiv cs.LG / 4/29/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper examines whether Transformers trained with chain-of-thought (CoT) can generalize to longer CoT traces than those seen during training, which remains insufficiently studied.
It finds that, under standard positional encodings and a finite vocabulary alphabet, CoT-capable Transformers cannot solve problems beyond TC^0 when length-generalizable learnability is required.
The authors show that allowing the vocabulary to scale with problem size enables a length-generalizable simulation of Turing machines, with CoT trace length growing linearly with the simulated runtime (up to a constant).
The proposed method addresses two key obstacles to reliable length generalization—repeated copying and last-occurrence retrieval—by using unique “signpost” tokens for tape positions and logging only value changes to reconstruct tape state.
Empirically, the paper demonstrates that signpost tokens and value-change encodings offer practical guidance for improving length generalization on difficult tasks.

Abstract

Chain-of-Thought (CoT) has been shown to empirically improve Transformers' performance, and theoretically increase their expressivity to Turing completeness. However, whether Transformers can learn to generalize to CoT traces longer than those seen during training is understudied. We use recent theoretical frameworks for Transformer length generalization and find that -- under standard positional encodings and a finite alphabet -- Transformers with CoT cannot solve problems beyond

TC^0

, i.e. the expressivity benefits do not hold under the stricter requirement of length-generalizable learnability. However, if we allow the vocabulary to grow with problem size, we attain a length-generalizable simulation of Turing machines where the CoT trace length is linear in the simulated runtime up to a constant. Our construction overcomes two core obstacles to reliable length generalization: repeated copying and last-occurrence retrieval. We assign each tape position a unique signpost token, and log only value changes to enable recovery of the current tape symbol through counts circumventing both barriers. Further, we empirically show that the use of such signpost tokens and value change encodings provide actionable guidance to improve length generalization on hard problems.

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

Dev.to

IK_LLAMA now supports Qwen3.5 MTP Support :O

Reddit r/LocalLLaMA

OpenAI models, Codex, and Managed Agents come to AWS

Dev.to

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

Vertical SaaS for Startups 2026: Building a Niche AI-First Product

Dev.to

Barriers to Universal Reasoning With Transformers (And How to Overcome Them)

Key Points

Abstract

Related Articles

How I Use AI Agents to Maintain a Living Knowledge Base for My Team

IK_LLAMA now supports Qwen3.5 MTP Support :O

OpenAI models, Codex, and Managed Agents come to AWS

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Vertical SaaS for Startups 2026: Building a Niche AI-First Product

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer