Do Transformers Use their Depth Adaptively? Evidence from a Relational Reasoning Task
arXiv cs.LG / 4/15/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper tests whether transformer models can use their depth adaptively as task difficulty increases, using a multi-hop relational reasoning benchmark with difficulty set by the number of reasoning “hops.”
- It evaluates adaptation using two probing methods: early layer readouts (logit lens) to track prediction evolution and causal patching to measure how task-relevant information is integrated across tokens.
- Results show only limited adaptive-depth behavior in pretrained models, where easier tasks may be solvable with fewer layers while longer chains generally require more layers for cross-token integration.
- For models finetuned on the task, evidence for adaptive depth becomes clearer and more consistent, and the effect is stronger under looser finetuning that does not preserve general language-modeling capabilities.
- The findings suggest that apparent depth adaptation depends on training regime and may be more pronounced when fine-tuning shapes computation toward the specific reasoning task.
Related Articles

Black Hat Asia
AI Business

The Complete Guide to Better Meeting Productivity with AI Note-Taking
Dev.to

5 Ways Real-Time AI Can Boost Your Sales Call Performance
Dev.to

RAG in Practice — Part 4: Chunking, Retrieval, and the Decisions That Break RAG
Dev.to
Why dynamically routing multi-timescale advantages in PPO causes policy collapse (and a simple decoupled fix) [R]
Reddit r/MachineLearning