Form Follows Function: Recursive Stem Model

arXiv cs.AI / 3/18/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

RSM is a recursive reasoning model that preserves the TRM-style backbone but changes training to be depth-agnostic by detaching hidden-state history, treating early iterations as warm-up steps, and applying loss only at the final step.
It enables independent growth of the outer recursion depth H and inner compute depth L, and uses a stochastic outer-transition scheme (stochastic depth over H) to stabilize deeper architectures.
The approach yields >20× faster training than TRM and about a 5× reduction in error rate, while enabling test-time refinement across many additional steps without retraining.
In experiments, Sudoku-Extreme reaches 97.5% exact accuracy with roughly 1 hour of training on a single A100, and Maze-Hard (30×30) achieves ~80% exact accuracy in ~40 minutes using an attention-based instantiation.
The iterative settling process provides a reliability signal: non-settling trajectories warn of unresolved cases, while stable fixed points can be paired with domain verifiers for practical correctness checks.

Abstract

Recursive reasoning models such as Hierarchical Reasoning Model (HRM) and Tiny Recursive Model (TRM) show that small, weight-shared networks can solve compute-heavy and NP puzzles by iteratively refining latent states, but their training typically relies on deep supervision and/or long unrolls that increase wall-clock cost and can bias the model toward greedy intermediate behavior. We introduce Recursive Stem Model (RSM), a recursive reasoning approach that keeps the TRM-style backbone while changing the training contract so the network learns a stable, depth-agnostic transition operator. RSM fully detaches the hidden-state history during training, treats early iterations as detached "warm-up" steps, and applies loss only at the final step. We further grow the outer recursion depth

H

and inner compute depth

L

independently and use a stochastic outer-transition scheme (stochastic depth over

H

) to mitigate instability when increasing depth. This yields two key capabilities: (i)

>20\times

faster training than TRM while improving accuracy (

\approx 5\times

reduction in error rate), and (ii) test-time scaling where inference can run for arbitrarily many refinement steps (

\sim 20,000 H_{\text{test}} \gg 20 H_{\text{train}}

), enabling additional "thinking" without retraining. On Sudoku-Extreme, RSM reaches 97.5% exact accuracy with test-time compute (within ~1 hour of training on a single A100), and on Maze-Hard (

30 \times 30

) it reaches ~80% exact accuracy in ~40 minutes using attention-based instantiation. Finally, because RSM implements an iterative settling process, convergence behavior provides a simple, architecture-native reliability signal: non-settling trajectories warn that the model has not reached a viable solution and can be a guard against hallucination, while stable fixed points can be paired with domain verifiers for practical correctness checks.

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

Dev.to

The Research That Doesn't Exist

Dev.to

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI

TechCrunch

Krish Naik: AI Learning Path For 2026- Data Science, Generative and Agentic AI Roadmap

Dev.to

Form Follows Function: Recursive Stem Model

Key Points

Abstract

Related Articles

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO

How to Build Passive Income with AI in 2026: A Developer's Practical Guide

The Research That Doesn't Exist

Jeff Bezos reportedly wants $100 billion to buy and transform old manufacturing firms with AI

Krish Naik: AI Learning Path For 2026- Data Science, Generative and Agentic AI Roadmap

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer