FractalMamba++: Scaling Vision Mamba Across Resolutions via Hilbert Fractal Geometry

arXiv cs.CV / 5/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses a key limitation of Vision Mamba: performance can degrade when 2D patch grids are serialized into a 1D recurrence, especially at inference resolutions larger than the training grid.
It introduces FractalMamba++, which uses Hilbert curve–based fractal serialization to better preserve spatial locality across resolutions, improving neighborhood consistency compared with raster/linear scans.
The model adds a Fractal Hierarchy Skip Connection (FHSC) that injects long-range state using deterministic routes derived from Hilbert recursion, reducing long-sequence information fading without runtime search or custom CUDA kernels.
It further incorporates Fractal-Aware 2D Rotary Position Encoding (FA-RoPE) to tie positional interactions to true 2D proximity and fractal hierarchy level rather than the serialized 1D distance.
Experiments across ImageNet classification, COCO detection/segmentation, ADE20K segmentation, and LEVIR-CD+ change detection show FractalMamba++ delivers improved results over existing Mamba-based vision backbones, particularly for high-resolution inputs.

Abstract

Vision Mamba offers linear complexity for long visual sequences, yet its performance depends critically on how a two-dimensional patch grid is serialized into a one-dimensional state-space recurrence. Raster-style scans disrupt spatial continuity, and the mismatch between 2D locality and 1D state propagation becomes increasingly severe when the inference resolution grows beyond the training grid. This paper presents FractalMamba++, a resolution-scalable vision backbone organized around a single geometric principle: the recursive self-similar structure of the Hilbert curve determines how patches are serialized, where long-range state shortcuts are inserted, and how positional relations are encoded. First, Hilbert-curve-based Fractal Serialization preserves local 2D neighborhoods more faithfully than linear scans and provides consistent neighborhood statistics across resolutions. Second, the Fractal Hierarchy Skip Connection (FHSC) derives a compact set of deterministic state-injection routes from Hilbert recursion levels, mitigating long-sequence information fading without runtime search, hand-derived gradients, or dedicated CUDA kernels. Third, Fractal-Aware 2D Rotary Position Encoding (FA-RoPE) combines normalized 2D coordinates with a fractal hierarchy level so that feature interactions depend on actual spatial proximity and recursive structural role rather than serialized 1D distance. Extensive experiments on ImageNet-1K classification, COCO detection and instance segmentation, ADE20K semantic segmentation, and LEVIR-CD+ remote sensing change detection show that FractalMamba++ improves over existing Mamba-based vision backbones, especially under high-resolution inputs.

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide

Dev.to

AI as Your Contingency Co-Pilot: Automating Wedding Day 'What-Ifs'

Dev.to

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

MarkTechPost

When Claude Hallucinates in Court: The Latham & Watkins Incident and What It Means for Attorney Liability

MarkTechPost

Solidity LM surpasses Opus

Reddit r/LocalLLaMA

FractalMamba++: Scaling Vision Mamba Across Resolutions via Hilbert Fractal Geometry

Key Points

Abstract

Related Articles

Top 10 Free AI Tools for Students in 2026: The Ultimate Study Guide

AI as Your Contingency Co-Pilot: Automating Wedding Day 'What-Ifs'

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss

When Claude Hallucinates in Court: The Latham & Watkins Incident and What It Means for Attorney Liability

Solidity LM surpasses Opus

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer