Stability and Generalization in Looped Transformers

arXiv cs.LG / 4/17/2026

📰 NewsModels & Research

共有:

Key Points

The paper proposes a fixed-point analysis framework for looped transformers to determine which architectural choices enable generalization to harder test-time inputs.
The authors analyze stability along three axes—reachability, input-dependence, and geometry—and provide theoretical conditions under which fixed-point iteration yields meaningful predictions.
They prove that looped networks without recall have countably many fixed points and cannot achieve strong input-dependence in any spectral regime, limiting their extrapolation ability.
By combining recall with outer normalization, the study identifies a regime where fixed points are reachable, locally smooth with respect to inputs, and support stable backpropagation.
Experiments on chess, sudoku, and prefix-sums show downstream performance aligns with the framework’s predictions, and a new “internal recall” variant can outperform standard recall placement when outer normalization is used.

Abstract

Looped transformers promise test-time compute scaling by spending more iterations on harder problems, but it remains unclear which architectural choices let them extrapolate to harder problems at test time rather than memorize training-specific solutions. We introduce a fixed-point based framework for analyzing looped architectures along three axes of stability -- reachability, input-dependence, and geometry -- and use it to characterize when fixed-point iteration yields meaningful predictions. Theoretically, we prove that looped networks without recall have countable fixed points and cannot achieve strong input-dependence at any spectral regime, while recall combined with outer normalization reliably produces a regime in which fixed points are simultaneously reachable, locally smooth in the input, and supported by stable backpropagation. Empirically, we train single-layer looped transformers on chess, sudoku, and prefix-sums and find that downstream performance tracks the framework's predictions across tasks and architectural configurations. We additionally introduce internal recall, a novel recall placement variant, and show that it becomes competitive with -- and on sudoku, substantially better than -- standard recall placement once outer normalization is applied.

India's Homegrown AI Ecosystem: 110+ Apps Across 22 Languages and 28 Sectors

Dev.to

Privacy-Preserving Active Learning for sustainable aquaculture monitoring systems with inverse simulation verification

Dev.to

Anthropic Releases Claude Opus 4.7: A Major Upgrade for Agentic Coding, High-Resolution Vision, and Long-Horizon Autonomous Tasks

MarkTechPost

Open-Source ML Platforms, LLM Workflow Reliability, and AI Bot Deployment

Dev.to

Claude/Gemini Benchmarks, Claude Code Dev Tooling, and Gemma 4 on-device with LiteRT

Dev.to

Stability and Generalization in Looped Transformers

Key Points

Abstract

Related Articles

India's Homegrown AI Ecosystem: 110+ Apps Across 22 Languages and 28 Sectors

Privacy-Preserving Active Learning for sustainable aquaculture monitoring systems with inverse simulation verification

Anthropic Releases Claude Opus 4.7: A Major Upgrade for Agentic Coding, High-Resolution Vision, and Long-Horizon Autonomous Tasks

Open-Source ML Platforms, LLM Workflow Reliability, and AI Bot Deployment

Claude/Gemini Benchmarks, Claude Code Dev Tooling, and Gemma 4 on-device with LiteRT

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer