Composite Silhouette: A Subsampling-based Aggregation Strategy
arXiv cs.LG / 4/16/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses unsupervised model selection for estimating the number of clusters, highlighting that the standard (micro-averaged) Silhouette coefficient can be biased toward larger clusters when cluster sizes are imbalanced.
- It proposes “Composite Silhouette,” which aggregates information across multiple subsampled clusterings instead of relying on a single partition, aiming to reduce both size-bias and noise from small clusters.
- For each subsample, the method adaptively combines micro- and macro-averaged Silhouette scores using a convex weight based on normalized discrepancy, smoothed by a bounded nonlinearity to control overreactions.
- The authors prove theoretical properties and provide finite-sample concentration guarantees for the subsampling-based estimate.
- Experiments on synthetic and real-world datasets show that Composite Silhouette better recovers the ground-truth number of clusters than standard micro or macro approaches.
Related Articles

Introducing Claude Opus 4.7
Anthropic News

Who Audits the Auditors? Building an LLM-as-a-Judge for Agentic Reliability
Dev.to

"Enterprise AI Cost Optimization: How Companies Are Cutting AI Infrastructure Sp
Dev.to

Config-first code generator to replace repetitive AI boilerplate — looking for feedback and collaborators
Dev.to

The US Government Fired 40% of an Agency, Then Asked AI to Do Their Jobs
Dev.to