EventFace: Event-Based Face Recognition via Structure-Driven Spatiotemporal Modeling

arXiv cs.CV / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes EventFace, an event-camera-based face recognition approach that models identity using structure-driven spatiotemporal representations rather than relying on stable RGB appearance.
  • To address the lack of dedicated event-based face datasets, the authors create EFace, a small-scale dataset captured under rigid facial motion.
  • EventFace transfers spatial facial priors from pretrained RGB face models to the event domain using LoRA, then encodes temporal information with a Motion Prompt Encoder (MPE) and fuses spatial and temporal features via a Spatiotemporal Modulator (STM).
  • Experiments report strong performance on evaluated baselines, including Rank-1 identification of 94.19% and an EER of 5.35%, with improved robustness under degraded illumination.
  • The learned representations are also described as having reduced template reconstructability, suggesting potential privacy benefits.

Abstract

Event cameras offer a promising sensing modality for face recognition due to their inherent advantages in illumination robustness and privacy-friendliness. However, because event streams lack the stable photometric appearance relied upon by conventional RGB-based face recognition systems, we argue that event-based face recognition should model structure-driven spatiotemporal identity representations shaped by rigid facial motion and individual facial geometry. Since dedicated datasets for event-based face recognition remain lacking, we construct EFace, a small-scale event-based face dataset captured under rigid facial motion. To learn effectively from this limited event data, we further propose EventFace, a framework for event-based face recognition that integrates spatial structure and temporal dynamics for identity modeling. Specifically, we employ Low-Rank Adaptation (LoRA) to transfer structural facial priors from pretrained RGB face models to the event domain, thereby establishing a reliable spatial basis for identity modeling. Building on this foundation, we further introduce a Motion Prompt Encoder (MPE) to explicitly encode temporal features and a Spatiotemporal Modulator (STM) to fuse them with spatial features, thereby enhancing the representation of identity-relevant event patterns. Extensive experiments demonstrate that EventFace achieves the best performance among the evaluated baselines, with a Rank-1 identification rate of 94.19% and an equal error rate (EER) of 5.35%. Results further indicate that EventFace exhibits stronger robustness under degraded illumination than the competing methods. In addition, the learned representations exhibit reduced template reconstructability.