Bidirectional Cross-Modal Prompting for Event-Frame Asymmetric Stereo

arXiv cs.CV / 4/17/2026

📰 NewsModels & Research

共有:

Key Points

Conventional frame cameras can struggle with fast motion due to limited temporal resolution and motion blur, while event cameras handle high dynamic range and motion more effectively.
The paper proposes Bi-CMPStereo, a bidirectional cross-modal prompting framework to better bridge the “modality gap” between event and frame data for stereo matching.
By projecting each modality into both the event and frame domains and learning aligned representations in a shared canonical space, the method aims to preserve domain-specific cues.
Experiments reported in the paper show improved performance over state-of-the-art approaches in both accuracy and generalization, especially for challenging dynamic scenes.
The work targets more reliable 3D perception via event-frame asymmetric stereo, leveraging complementary strengths of both sensing modalities.

Abstract

Conventional frame-based cameras capture rich contextual information but suffer from limited temporal resolution and motion blur in dynamic scenes. Event cameras offer an alternative visual representation with higher dynamic range free from such limitations. The complementary characteristics of the two modalities make event-frame asymmetric stereo promising for reliable 3D perception under fast motion and challenging illumination. However, the modality gap often leads to marginalization of domain-specific cues essential for cross-modal stereo matching. In this paper, we introduce Bi-CMPStereo, a novel bidirectional cross-modal prompting framework that fully exploits semantic and structural features from both domains for robust matching. Our approach learns finely aligned stereo representations within a target canonical space and integrates complementary representations by projecting each modality into both event and frame domains. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods in accuracy and generalization.