Event-Adaptive State Transition and Gated Fusion for RGB-Event Object Tracking

arXiv cs.AI / 4/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that current RGB-Event (RGBE) object tracking models built on Vision Mamba use fixed state-transition matrices that do not adapt to fluctuations in event sparsity, hurting cross-modal fusion robustness.
  • It introduces MambaTrack, a multimodal tracking framework based on a Dynamic State Space Model (DSSM) with an event-adaptive state transition mechanism that modulates transition behavior according to event stream density.
  • The framework includes a Gated Projection Fusion (GPF) module that projects RGB features into the event feature space and uses gates derived from event density and RGB confidence to control fusion strength.
  • Experiments report state-of-the-art results on the FE108 and FELT datasets, and the authors claim the lightweight design could support real-time embedded deployment.

Abstract

Existing Vision Mamba-based RGB-Event(RGBE) tracking methods suffer from using static state transition matrices, which fail to adapt to variations in event sparsity. This rigidity leads to imbalanced modeling-underfitting sparse event streams and overfitting dense ones-thus degrading cross-modal fusion robustness. To address these limitations, we propose MambaTrack, a multimodal and efficient tracking framework built upon a Dynamic State Space Model(DSSM). Our contributions are twofold. First, we introduce an event-adaptive state transition mechanism that dynamically modulates the state transition matrix based on event stream density. A learnable scalar governs the state evolution rate, enabling differentiated modeling of sparse and dense event flows. Second, we develop a Gated Projection Fusion(GPF) module for robust cross-modal integration. This module projects RGB features into the event feature space and generates adaptive gates from event density and RGB confidence scores. These gates precisely control the fusion intensity, suppressing noise while preserving complementary information. Experiments show that MambaTrack achieves state-of-the-art performance on the FE108 and FELT datasets. Its lightweight design suggests potential for real-time embedded deployment.