Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization [R]

Reddit r/MachineLearning / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a Depth-Recurrent Transformer (building on the earlier TRM approach) aimed at improving out-of-distribution (OOD) compositional generalization.
  • Reported results show decent OOD generalization on some tasks, but the approach appears to degrade significantly beyond a certain scale or setting, raising questions about why performance fails in harder regimes.
  • The authors argue that intermediate step supervision can harm generalization by making statistical heuristics “too easy” for the model to adopt, reducing incentive to perform genuine reasoning.
  • The discussion extends this claim to broader foundation-model weaknesses, suggesting a parallel to how experts may over-rely on intuition shaped by experience rather than explicit reasoning.
  • The work emphasizes the tradeoff between “thinking deeper” via structured reasoning signals versus “thinking longer” in ways that may encourage heuristic shortcutting.

Paper:

https://arxiv.org/abs/2603.21676

I found this interesting as another iteration of the TRM approach:

  1. Shows decent OOD generalization in 2/3 tasks
    1. (but why does this fail >2x? and why is unstructured text so much worse?)
  2. Explains why intermediate step supervision can hurt generalization.
    1. This makes statistical heuristics "irresistible" to the model, impairing investment in genuine "reasoning."
    2. I buy this, and would go further to assert it captures the (insidious) weaknesses of foundation models, and maybe even explains the trap expert humans fall into, when they rely on their (expansive) experience to generate intuition, vs. thinking through a situation with less heuristics and more explicit reasoning.
submitted by /u/marojejian
[link] [comments]