ECG Biometrics with ArcFace-Inception: External Validation on MIMIC and HEEDB

arXiv cs.LG / 4/7/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper evaluates ECG biometrics using a 1D Inception-v1 model trained with ArcFace on a large internal clinical dataset (164,440 12-lead ECGs from 53,079 patients) and externally tests on MIMIC-IV-ECG and HEEDB.
  • Using a unified closed-set leave-one-out protocol with Rank@K and TAR@FAR, the system shows strong identifiability under broadly comparable conditions, achieving Rank@1 of 0.9506 (ASUGI-DB), 0.8291 (MIMIC-GC), and 0.6884 (HEEDB-GC).
  • Temporal stress experiments reveal performance degradation with increasing year gaps even at constant gallery size, with Rank@1 dropping (e.g., MIMIC: 0.7853→0.6433 over 1–5 years; HEEDB: 0.6864→0.5560).
  • Gallery size and domain heterogeneity substantially affect operational quality: HEEDB scale tests show monotonic degradation as the gallery grows, with recovery when more examinations per patient are available.
  • Post-hoc reranking improves retrieval on HEEDB-RR, where AS-norm raises Rank@1 to 0.8005 from a 0.7765 baseline, indicating that score processing can partially mitigate domain/scale effects.

Abstract

ECG biometrics has been studied mainly on small cohorts and short inter-session intervals, leaving open how identification behaves under large galleries, external domain shift, and multi-year temporal gaps. We evaluated a 1D Inception-v1 model trained with ArcFace on an internal clinical corpus of 164,440 12-lead ECGs from 53,079 patients and tested it on larger cohorts derived from MIMIC-IV-ECG and HEEDB. The study used a unified closed-set leave-one-out protocol with Rank@K and TAR@FAR metrics, together with scale, temporal-stress, reranking, and confidence analyses. Under general comparability, the system achieved Rank@1 of 0.9506 on ASUGI-DB, 0.8291 on MIMIC-GC, and 0.6884 on HEEDB-GC. In the temporal stress test at constant gallery size, Rank@1 declined from 0.7853 to 0.6433 on MIMIC and from 0.6864 to 0.5560 on HEEDB from 1 to 5 years. Scale analysis on HEEDB showed monotonic degradation as gallery size increased and recovery as more examinations per patient became available. On HEEDB-RR, post-hoc reranking further improved retrieval, with AS-norm reaching Rank@1 = 0.8005 from a 0.7765 baseline. ECG identity information therefore remains measurable under externally validated large-scale closed-set conditions, but its operational quality is strongly affected by domain heterogeneity, longitudinal drift, gallery size, and second-stage score processing.