FreqTrack: Frequency Learning based Vision Transformer for RGB-Event Object Tracking

arXiv cs.CV / 4/17/2026

📰 NewsModels & Research

Key Points

  • Existing RGB-only visual tracking methods struggle in complex, dynamic scenes, and while event sensors can help, many RGB-event fusion approaches underutilize event data’s temporal and high-frequency properties.
  • The paper proposes FreqTrack, a frequency-aware RGB-event (RGBE) object tracking framework that uses frequency-domain transformations to build complementary correlations between modalities for stronger feature fusion.
  • It introduces a Spectral Enhancement Transformer (SET) layer with multi-head dynamic Fourier filtering to adaptively enhance and select frequency-domain features.
  • It also adds a Wavelet Edge Refinement (WER) module that uses learnable wavelet transforms to extract multi-scale edge structures from event data, improving performance in fast-motion and low-light conditions.
  • Experiments on COESOT and FE108 show competitive results, including a top precision of 76.6% on the COESOT benchmark, supporting the value of frequency-domain modeling for RGBE tracking.

Abstract

Existing single-modal RGB trackers often face performance bottlenecks in complex dynamic scenes, while the introduction of event sensors offers new potential for enhancing tracking capabilities. However, most current RGB-event fusion methods, primarily designed in the spatial domain using convolutional, Transformer, or Mamba architectures, fail to fully exploit the unique temporal response and high-frequency characteristics of event data. To address this, we1 propose FreqTrack, a frequency-aware RGBE tracking framework that establishes complementary inter-modal correlations through frequency-domain transformations for more robust feature fusion. We design a Spectral Enhancement Transformer (SET) layer that incorporates multi-head dynamic Fourier filtering to adaptively enhance and select frequency-domain features. Additionally, we develop a Wavelet Edge Refinement (WER) module, which leverages learnable wavelet transforms to explicitly extract multi-scale edge structures from event data, effectively improving modeling capability in high-speed and low-light scenarios. Extensive experiments on the COESOT and FE108 datasets demonstrate that FreqTrack achieves highly competitive performance, particularly attaining leading precision of 76.6\% on the COESOT benchmark, validating the effectiveness of frequency-domain modeling for RGBE tracking.