Vision Tiny Recursion Model (ViTRM): Parameter-Efficient Image Classification via Recursive State Refinement

arXiv cs.CV / 3/23/2026

📰 NewsModels & Research

共有:

Key Points

The Vision Tiny Recursion Model (ViTRM) replaces the L-layer ViT encoder with a tiny 3-layer block applied recursively a fixed number of times to perform iterative state refinement.
It achieves up to 6x fewer parameters than CNN-based models and up to 84x fewer than ViT, while maintaining competitive accuracy on CIFAR-10 and CIFAR-100.
The approach demonstrates that recursive computation can substitute for deep architectural stacks in vision tasks without sacrificing performance.
By enabling parameter-efficient vision models, ViTRM could broaden deployment in resource-constrained environments and influence future model design decisions.

Abstract

The success of deep learning in computer vision has been driven by models of increasing scale, from deep Convolutional Neural Networks (CNN) to large Vision Transformers (ViT). While effective, these architectures are parameter-intensive and demand significant computational resources, limiting deployment in resource-constrained environments. Inspired by Tiny Recursive Models (TRM), which show that small recursive networks can solve complex reasoning tasks through iterative state refinement, we introduce the \textbf{Vision Tiny Recursion Model (ViTRM)}: a parameter-efficient architecture that replaces the

L

-layer ViT encoder with a single tiny

k

-layer block (

k{=}3

) applied recursively

N

times. Despite using up to

6 \times

and

84 \times

fewer parameters than CNN based models and ViT respectively, ViTRM maintains competitive performance on CIFAR-10 and CIFAR-100. This demonstrates that recursive computation is a viable, parameter-efficient alternative to architectural depth in vision.

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

Dev.to

The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

Dev.to

[P] Prompt optimization for analog circuit placement — 97% of expert quality, zero training data

Reddit r/MachineLearning

[R] Looking for arXiv endorser (cs.AI or cs.LG)

Reddit r/MachineLearning

I curated an 'Awesome List' for Generative AI in Jewelry- papers, datasets, open-source models and tools included!

Reddit r/artificial

Vision Tiny Recursion Model (ViTRM): Parameter-Efficient Image Classification via Recursive State Refinement

Key Points

Abstract

Related Articles

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

[P] Prompt optimization for analog circuit placement — 97% of expert quality, zero training data

[R] Looking for arXiv endorser (cs.AI or cs.LG)

I curated an 'Awesome List' for Generative AI in Jewelry- papers, datasets, open-source models and tools included!

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer