Do Transformers Use their Depth Adaptively? Evidence from a Relational Reasoning Task

arXiv cs.LG / 4/15/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper tests whether transformer models can use their depth adaptively as task difficulty increases, using a multi-hop relational reasoning benchmark with difficulty set by the number of reasoning “hops.”
  • It evaluates adaptation using two probing methods: early layer readouts (logit lens) to track prediction evolution and causal patching to measure how task-relevant information is integrated across tokens.
  • Results show only limited adaptive-depth behavior in pretrained models, where easier tasks may be solvable with fewer layers while longer chains generally require more layers for cross-token integration.
  • For models finetuned on the task, evidence for adaptive depth becomes clearer and more consistent, and the effect is stronger under looser finetuning that does not preserve general language-modeling capabilities.
  • The findings suggest that apparent depth adaptation depends on training regime and may be more pronounced when fine-tuning shapes computation toward the specific reasoning task.

Abstract

We investigate whether transformers use their depth adaptively across tasks of increasing difficulty. Using a controlled multi-hop relational reasoning task based on family stories, where difficulty is determined by the number of relationship hops that must be composed, we monitor (i) how predictions evolve across layers via early readouts (the logit lens) and (ii) how task-relevant information is integrated across tokens via causal patching. For pretrained models, we find some limited evidence for adaptive depth use: some larger models need fewer layers to arrive at plausible answers for easier tasks, and models generally use more layers to integrate information across tokens as chain length increases. For models finetuned on the task, we find clearer and more consistent evidence of adaptive depth use, with the effect being stronger for less constrained finetuning regimes that do not preserve general language modeling abilities.