InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model

arXiv cs.AI / 3/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

InfoMamba introduces an attention-free hybrid architecture that replaces token-level self-attention with a linear filtering layer acting as a minimal-bandwidth global interface, paired with a selective recurrent stream.
A consistency boundary analysis is presented to characterize when diagonal short-memory SSMs can approximate causal attention and to identify remaining structural gaps.
The model uses information-maximizing fusion (IMF) to dynamically inject global context into SSM dynamics and employs a mutual-information-inspired objective to encourage complementary information usage.
Empirical results across classification, dense prediction, and non-vision tasks show InfoMamba outperforms strong Transformer and SSM baselines with near-linear scaling and competitive accuracy-efficiency trade-offs.

Abstract

Balancing fine-grained local modeling with long-range dependency capture under computational constraints remains a central challenge in sequence modeling. While Transformers provide strong token mixing, they suffer from quadratic complexity, whereas Mamba-style selective state-space models (SSMs) scale linearly but often struggle to capture high-rank and synchronous global interactions. We present a consistency boundary analysis that characterizes when diagonal short-memory SSMs can approximate causal attention and identifies structural gaps that remain. Motivated by this analysis, we propose InfoMamba, an attention-free hybrid architecture. InfoMamba replaces token-level self-attention with a concept bottleneck linear filtering layer that serves as a minimal-bandwidth global interface and integrates it with a selective recurrent stream through information-maximizing fusion (IMF). IMF dynamically injects global context into the SSM dynamics and encourages complementary information usage through a mutual-information-inspired objective. Extensive experiments on classification, dense prediction, and non-vision tasks show that InfoMamba consistently outperforms strong Transformer and SSM baselines, achieving competitive accuracy-efficiency trade-offs while maintaining near-linear scaling.

How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command

Dev.to

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

Dev.to

What CVE-2026-25253 Taught Me About Building Safe AI Assistants

Dev.to

Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users

Dev.to

The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

Dev.to

InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model

Key Points

Abstract

Related Articles

How CVE-2026-25253 exposed every OpenClaw user to RCE — and how to fix it in one command

Does Synthetic Data Generation of LLMs Help Clinical Text Mining?

What CVE-2026-25253 Taught Me About Building Safe AI Assistants

Day 52: Building vs Shipping — Why We Had 711 Commits and 0 Users

The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer