Mining Electronic Health Records to Investigate Effectiveness of Ensemble Deep Clustering

arXiv cs.LG / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study evaluates how well traditional (e.g., K-means), hybrid, and deep learning clustering methods work on EHR-derived patient representations, using real heart failure data from the All of Us Research Program.
It finds that traditional clustering performs more robustly than deep clustering methods designed for image-like tasks, highlighting a domain mismatch between image clustering and tabular EHR embeddings.
To improve deep clustering, the authors propose an ensemble-based deep clustering method that aggregates cluster assignments across multiple embedding dimensions instead of relying on a single embedding space.
In a new ensemble framework that combines traditional and deep clustering, the proposed ensemble embedding delivers the best overall performance across 14 clustering approaches and multiple patient cohorts.
The paper emphasizes biologically sex-specific clustering as important for EHR analysis and argues for combining traditional and deep clustering rather than using a single method in isolation.

Abstract

In electronic health records (EHRs), clustering patients and distinguishing disease subtypes are key tasks to elucidate pathophysiology and aid clinical decision-making. However, clustering in healthcare informatics is still based on traditional methods, especially K-means, and has achieved limited success when applied to embedding representations learned by autoencoders as hybrid methods. This paper investigates the effectiveness of traditional, hybrid, and deep learning methods in heart failure patient cohorts using real EHR data from the All of Us Research Program. Traditional clustering methods perform robustly because deep learning approaches are specifically designed for image clustering, a task that differs substantially from the tabular EHR data setting. To address the shortcomings of deep clustering, we introduce an ensemble-based deep clustering approach that aggregates cluster assignments obtained from multiple embedding dimensions, rather than relying on a single fixed embedding space. When combined with traditional clustering in a novel ensemble framework, the proposed ensemble embedding for deep clustering delivers the best overall performance ranking across 14 diverse clustering methods and multiple patient cohorts. This paper underscores the importance of biological sex-specific clustering of EHR data and the advantages of combining traditional and deep clustering approaches over a single method.

Black Hat Asia

AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled

Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?

Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)

Dev.to

Mining Electronic Health Records to Investigate Effectiveness of Ensemble Deep Clustering

Key Points

Abstract

Related Articles

Black Hat Asia

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Why Most Productivity Systems Fail (And What to Do Instead)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer