Match-Any-Events: Zero-Shot Motion-Robust Feature Matching Across Wide Baselines for Event Cameras

arXiv cs.CV / 4/22/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper proposes the first event-camera feature matching model that performs zero-shot wide-baseline correspondence across datasets without target-domain fine-tuning or adaptation.
It introduces a motion-robust, computationally efficient attention backbone that learns multi-timescale features from event streams, along with sparsity-aware event token selection to keep large-scale training feasible.
To overcome the lack of wide-baseline supervision, the authors build a robust event motion synthesis framework that generates large-scale training datasets with varied viewpoints, modalities, and motions.
Experiments on multiple benchmarks show a 37.7% improvement over prior best event feature matching methods, and the project provides code and data publicly.

Abstract

Event cameras have recently shown promising capabilities in instantaneous motion estimation due to their robustness to low light and fast motions. However, computing wide-baseline correspondence between two arbitrary views remains a significant challenge, since event appearance changes substantially with motion, and learning-based approaches are constrained by both scalability and limited wide-baseline supervision. We therefore introduce the first event matching model that achieves cross-dataset wide-baseline correspondence in a zero-shot manner: a single model trained once is deployed on unseen datasets without any target-domain fine-tuning or adaptation. To enable this capability, we introduce a motion-robust and computationally efficient attention backbone that learns multi-timescale features from event streams, augmented with sparsity-aware event token selection, making large-scale training on diverse wide-baseline supervision computationally feasible. To provide the supervision needed for wide-baseline generalization, we develop a robust event motion synthesis framework to generate large-scale event-matching datasets with augmented viewpoints, modalities, and motions. Extensive experiments across multiple benchmarks show that our framework achieves a 37.7% improvement over the previous best event feature matching methods. Code and data are available at: https://github.com/spikelab-jhu/Match-Any-Events.