Joint Representation Learning and Clustering via Gradient-Based Manifold Optimization

arXiv stat.ML / 4/16/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces a joint learning framework that performs dimensionality reduction and clustering at the same time to address the difficulty of clustering high-dimensional data.
  • It learns the parameters of a dimensionality reduction method (such as a linear projection or a neural network) while simultaneously optimizing cluster assignments using a gradient-based manifold optimization approach.
  • A key example uses a Gaussian Mixture Model (GMM) on the learned low-dimensional features, drawing a loose analogy to unsupervised Linear Discriminant Analysis (LDA).
  • The method is evaluated on simulated unsupervised data and the MNIST benchmark dataset, where results reportedly outperform several established clustering algorithms.
  • Overall, the work positions manifold optimization as a mechanism for jointly searching over both projection parameters and cluster structure in an unsupervised setting.

Abstract

Clustering and dimensionality reduction have been crucial topics in machine learning and computer vision. Clustering high-dimensional data has been challenging for a long time due to the curse of dimensionality. For that reason, a more promising direction is the joint learning of dimension reduction and clustering. In this work, we propose a Manifold Learning Framework that learns dimensionality reduction and clustering simultaneously. The proposed framework is able to jointly learn the parameters of a dimension reduction technique (e.g. linear projection or a neural network) and cluster the data based on the resulting features (e.g. under a Gaussian Mixture Model framework). The framework searches for the dimension reduction parameters and the optimal clusters by traversing a manifold,using Gradient Manifold Optimization. The obtained The proposed framework is exemplified with a Gaussian Mixture Model as one simple but efficient example, in a process that is somehow similar to unsupervised Linear Discriminant Analysis (LDA). We apply the proposed method to the unsupervised training of simulated data as well as a benchmark image dataset (i.e. MNIST). The experimental results indicate that our algorithm has better performance than popular clustering algorithms from the literature.