One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models

arXiv cs.AI / 4/22/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

Looped transformers can increase computational depth by repeatedly applying the same transformer block to iteratively refine predictions without adding parameters.
The paper argues that learning long, search-like refinement trajectories is hard when training only supervises the final target and not the intermediate steps.
It proposes Denoising Recursion Models, which use noise corruption during training but learn to denoise and reconstruct over multiple recursive steps (instead of a single step as in standard diffusion-style training).
The authors claim this creates a curriculum of intermediate states, better aligns training with inference-time behavior, and encourages non-greedy, forward-looking generation.
Experiments reportedly show improved performance over the Tiny Recursion Model (TRM) on ARC-AGI, matching the context of TRM’s recent breakthrough results.

Abstract

Looped transformers scale computational depth without increasing parameter count by repeatedly applying a shared transformer block and can be used for iterative refinement, where each loop rewrites a full fixed-size prediction in parallel. On difficult problems, such as those that require search-like computation, reaching a highly structured solution starting from noise can require long refinement trajectories. Learning such trajectories is challenging when training specifies only the target solution and provides no supervision over the intermediate refinement path. Diffusion models tackle this issue by corrupting data with varying magnitudes of noise and training the model to reverse it in a \textit{single step}. However, this process misaligns training and testing behaviour. We introduce Denoising Recursion Models, a method that similarly corrupts data with noise but trains the model to reverse the corruption over \textit{multiple} recursive steps. This strategy provides a tractable curriculum of intermediate states, while better aligning training with testing and incentivizing non-greedy, forward-looking generation. Through extensive experiments, we show this approach outperforms the Tiny Recursion Model (TRM) on ARC-AGI, where it recently achieved breakthrough performance.