Membership Inference Attacks Expose Participation Privacy in ECG Foundation Encoders

arXiv cs.LG / 4/14/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Self-supervised “foundation” ECG encoders are being reused across tasks and institutions, but this reuse can leak participation privacy through model outputs or latent embeddings even when raw waveforms and labels are withheld.
  • The paper presents an audit of membership inference attacks against multiple ECG foundation encoder types, including contrastive methods (SimCLR, TS2Vec) and masked reconstruction (CNN- and Transformer-based MAE).
  • It evaluates three attacker models based on realistic interfaces—score-only black-box scalar outputs, adaptive learned attackers using repeated queries, and embedding-access attackers probing representation geometry.
  • Results show participation leakage varies by objective and is strongest for small or institution-specific cohorts, while larger and more diverse pretraining datasets reduce tail risk.
  • The authors conclude that limiting access to raw signals or diagnostic labels is not sufficient for participation privacy, requiring deployment-aware, interface-specific auditing for connected-health systems.

Abstract

Foundation-style ECG encoders pretrained with self-supervised learning are increasingly reused across tasks, institutions, and deployment contexts, often through model-as-a-service interfaces that expose scalar scores or latent representations. While such reuse improves data efficiency and generalization, it raises a participation privacy concern: can an adversary infer whether a specific individual or cohort contributed ECG data to pretraining, even when raw waveforms and diagnostic labels are never disclosed? In connected-health settings, training participation itself may reveal institutional affiliation, study enrollment, or sensitive health context. We present an implementation-grounded audit of membership inference attacks (MIAs) against modern self-supervised ECG foundation encoders, covering contrastive objectives (SimCLR, TS2Vec) and masked reconstruction objectives (CNN- and Transformer-based MAE). We evaluate three realistic attacker interfaces: (i) score-only black-box access to scalar outputs, (ii) adaptive learned attackers that aggregate subject-level statistics across repeated queries, and (iii) embedding-access attackers that probe latent representation geometry. Using a subject-centric protocol with window-to-subject aggregation and calibration at fixed false-positive rates under a cross-dataset auditing setting, we observe heterogeneous and objective-dependent participation leakage: leakage is most pronounced in small or institution-specific cohorts and, for contrastive encoders, can saturate in embedding space, while larger and more diverse datasets substantially attenuate operational tail risk. Overall, our results show that restricting access to raw signals or labels is insufficient to guarantee participation privacy, underscoring the need for deployment-aware auditing of reusable biosignal foundation encoders in connected-health systems.