AI Navigate

Tiny Recursive Reasoning with Mamba-2 Attention Hybrid

arXiv cs.CL / 3/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study investigates replacing Transformer blocks in TRM with Mamba-2 hybrid operators while keeping parameter counts nearly identical (6.83M vs 6.86M).
  • On ARC-AGI-1, the Mamba-2 hybrid improves pass@2 by 2.0 percentage points (45.88% vs 43.88%) and shows larger gains at higher K (pass@100) by about 4.75%, with pass@1 remaining on par.
  • The results suggest the hybrid preserves recursive reasoning capability within the scaffold and increases candidate coverage without harming top-1 selection.
  • The work positions SSM-based operators as viable in recursive design and advances understanding of optimal mixing strategies for recursive reasoning.

Abstract

Recent work on recursive reasoning models like TRM demonstrates that tiny networks (7M parameters) can achieve strong performance on abstract reasoning tasks through latent recursion -- iterative refinement in hidden representation space without emitting intermediate tokens. This raises a natural question about operator choice: Mamba-2's state space recurrence is itself a form of iterative refinement, making it a natural candidate for recursive reasoning -- but does introducing Mamba-2 into the recursive scaffold preserve reasoning capability? We investigate this by replacing the Transformer blocks in TRM with Mamba-2 hybrid operators while maintaining parameter parity (6.83M vs 6.86M parameters). On ARC-AGI-1, we find that the hybrid improves pass@2 (the official metric) by +2.0\% (45.88\% vs 43.88\%) and consistently outperforms at higher K values (+4.75\% at pass@100), whilst maintaining pass@1 parity. This suggests improved candidate coverage -- the model generates correct solutions more reliably -- with similar top-1 selection. Our results validate that Mamba-2 hybrid operators preserve reasoning capability within the recursive scaffold, establishing SSM-based operators as viable candidates in the recursive operator design space and taking a first step towards understanding the best mixing strategies for recursive reasoning.