PRISM-CTG: A Foundation Model for Cardiotocography Analysis with Multi-View SSL

arXiv cs.LG / 5/6/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces PRISM-CTG, a clinically grounded self-supervised foundation model for cardiotocography (CTG) analysis that uses large volumes of unlabeled recordings to learn transferable representations.
  • PRISM-CTG is pretrained with a multi-view SSL framework using three complementary pretext objectives—masked signal reconstruction (guided by random projections), prediction of clinical variables, and feature classification—each with a dedicated task token and cross-attention for information sharing.
  • The approach reframes patient metadata and domain knowledge into additional supervisory targets, enabling more clinically meaningful representation learning than conventional training setups that underuse these signals.
  • Experiments across seven CTG downstream tasks (antepartum and intrapartum) show PRISM-CTG consistently outperforms in-domain and SSL baselines, with strong external generalization on two datasets.
  • The model achieves performance comparable to studies trained on substantially larger, privately labeled datasets, and the authors claim it is the first large-scale foundation model for CTG focused on domain-level representation learning.

Abstract

Supervised deep learning models for automated CTG analysis are typically constrained by narrowly curated labelled datasets and limited patient cohorts, leaving substantial volumes of physiologically informative clinical recordings untapped. To address this limitation, we propose Physiology-aware Representation Learning via Integrated Self-supervision and Metadata for CTG (PRISM-CTG), a clinically grounded self-supervised foundation model (FM) for CTG that leverages large-scale unlabelled recordings to learn transferable domain-level representations. PRISM-CTG is pretrained using a multi-view self-supervised framework that jointly optimises 3 complementary pretext objectives: random-projected guided masked signal reconstruction, clinical variable prediction, and feature classification. Each objective is associated with a dedicated task-specific token, enabling specialised representation learning, while controlled cross-attention facilitates information exchange across clinical context. By reframing patient metadata and domain knowledge, which are often underutilised in conventional training as prediction targets, Prism-CTG transforms readily available clinical information into additional supervisory targets that guide clinically meaningful representation learning. Extensive experiments across 7 downstream CTG tasks in both antepartum and intrapartum domains demonstrated that PRISM-CTG consistently outperforms in-domain and SSL baselines. Notably, PRISM-CTG demonstrated strong generalisation under external validation on 2 datasets, while achieving comparable performance to studies trained on substantially larger, privately labelled datasets. To our knowledge, this is the first study to introduce large-scale FM for CTG that learns domain-level representations.