Assessing the impact of dimensionality reduction on clustering performance -- a systematic study

arXiv cs.LG / 4/27/2026

💬 OpinionModels & Research

共有:

Key Points

The paper conducts a systematic evaluation of how five dimensionality reduction methods (PCA, Kernel PCA, VAE, Isomap, and MDS) affect clustering performance on high-dimensional data.
It benchmarks four clustering algorithms (k-means, agglomerative hierarchical clustering, GMM, and OPTICS) using the Adjusted Rand Index (ARI) to compare results with and without dimensionality reduction.
The study tests multiple reduction levels suggested in prior literature—k−1 dimensions, and 25% and 50% of the original dimensionality—to measure how aggressiveness of reduction changes outcomes.
Results indicate that both the choice of dimensionality reduction technique and the reduction target level must be selected to match the data’s intrinsic geometry and the specific clustering algorithm.
5. The work highlights remaining gaps in comprehensive cross-method, cross-data-type assessment for dimensionality reduction in clustering pipelines.

Abstract

Dimensionality reduction is a critical preprocessing step for clustering high-dimensional data, yet comprehensive evaluation of its impact across diverse methods and data types remains limited. In this study, we systematically assess the influence of five dimensionality reduction techniques - Principal Component Analysis (PCA), Kernel Principal Component Analysis (Kernel PCA), Variational Autoencoder (VAE), Isometric Mapping (Isomap), and Multidimensional Scaling (MDS) - on the performance of four popular clustering algorithms - k-means, Agglomerative Hierarchical Clustering (AHC), Gaussian Mixture Models (GMM), and Ordering Points to Identify the Clustering Structure (OPTICS). We evaluate clustering quality using the Adjusted Rand Index (ARI), comparing results without and with dimensionality reduction at different reduction levels recommended in the literature (i.e., k-1, where k is the number of clusters, and 25% and 50% of the original number of dimensions). Our findings underscore the importance of a careful selection of the dimensionality reduction technique and the dimensionality reduction level that should be tailored to intrinsic data geometry and clustering algorithms under consideration.

The five loops between AI coding and AI engineering

Dev.to

A Machine Learning Model for Stock Market Prediction

Dev.to

Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo

MarkTechPost

Three limitations I keep hitting with retrieval-augmented generation in production and I'm running out of ideas [D]

Reddit r/MachineLearning

Anthropic's magic code-sniffer: More Swiss cheese than cheddar, for now

The Register

Assessing the impact of dimensionality reduction on clustering performance -- a systematic study

Key Points

Abstract

Related Articles

The five loops between AI coding and AI engineering

A Machine Learning Model for Stock Market Prediction

Meta AI Releases Sapiens2: A High-Resolution Human-Centric Vision Model for Pose, Segmentation, Normals, Pointmap, and Albedo

Three limitations I keep hitting with retrieval-augmented generation in production and I'm running out of ideas [D]

Anthropic's magic code-sniffer: More Swiss cheese than cheddar, for now

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer