AI Navigate

Federated Hierarchical Clustering with Automatic Selection of Optimal Cluster Numbers

arXiv cs.AI / 3/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • Fed-$k^*$-HC is a federated clustering framework that automatically determines the optimal number of clusters k* using hierarchical clustering and micro-subcluster prototypes generated by clients.
  • It addresses the challenges of unknown cluster counts, imbalanced cluster sizes, and privacy-preserving transmission constraints in federated learning.
  • Prototypes from clients are uploaded to a server and merged hierarchically through a density-based merging design to explore clusters of varying sizes and shapes.
  • The merging process progresses until it self-terminates based on neighboring relationships among prototypes to determine k*.
  • Experiments on diverse datasets demonstrate the method's capability to accurately identify a proper number of clusters in federated clustering.

Abstract

Federated Clustering (FC) is an emerging and promising solution in exploring data distribution patterns from distributed and privacy-protected data in an unsupervised manner. Existing FC methods implicitly rely on the assumption that clients are with a known number of uniformly sized clusters. However, the true number of clusters is typically unknown, and cluster sizes are naturally imbalanced in real scenarios. Furthermore, the privacy-preserving transmission constraints in federated learning inevitably reduce usable information, making the development of robust and accurate FC extremely challenging. Accordingly, we propose a novel FC framework named Fed-k^*-HC, which can automatically determine an optimal number of clusters k^* based on the data distribution explored through hierarchical clustering. To obtain the global data distribution for k^* determination, we let each client generate micro-subclusters. Their prototypes are then uploaded to the server for hierarchical merging. The density-based merging design allows exploring clusters of varying sizes and shapes, and the progressive merging process can self-terminate according to the neighboring relationships among the prototypes to determine k^*. Extensive experiments on diverse datasets demonstrate the FC capability of the proposed Fed-k^*-HC in accurately exploring a proper number of clusters.