Curvature-Aware PCA with Geodesic Tangent Space Aggregation for Semi-Supervised Learning

arXiv cs.AI / 4/22/2026

💬 OpinionModels & Research

Key Points

  • The paper argues that standard PCA can’t model data living on curved manifolds, while common manifold learning methods may lose PCA-like spectral stability and structure.
  • It introduces Geodesic Tangent Space Aggregation PCA (GTSA-PCA), which replaces the global covariance with curvature-weighted local covariances over a k-nearest-neighbor graph to form locally adaptive tangent subspaces.
  • GTSA-PCA adds a geodesic alignment operator that uses intrinsic graph distances together with subspace affinities to synchronize local representations in a unified spectral framework.
  • The method supports semi-supervised guidance during alignment to improve discriminative embeddings with minimal labeled data.
  • Experiments report consistent gains over PCA, Kernel PCA, Supervised PCA, and strong graph-based baselines (e.g., UMAP), especially for small-sample and high-curvature settings.

Abstract

Principal Component Analysis (PCA) is a fundamental tool for representation learning, but its global linear formulation fails to capture the structure of data supported on curved manifolds. In contrast, manifold learning methods model nonlinearity but often sacrifice the spectral structure and stability of PCA. We propose \emph{Geodesic Tangent Space Aggregation PCA (GTSA-PCA)}, a geometric extension of PCA that integrates curvature awareness and geodesic consistency within a unified spectral framework. Our approach replaces the global covariance operator with curvature-weighted local covariance operators defined over a k-nearest neighbor graph, yielding local tangent subspaces that adapt to the manifold while suppressing high-curvature distortions. We then introduce a geodesic alignment operator that combines intrinsic graph distances with subspace affinities to globally synchronize these local representations. The resulting operator admits a spectral decomposition whose leading components define a geometry-aware embedding. We further incorporate semi-supervised information to guide the alignment, improving discriminative structure with minimal supervision. Experiments on real datasets show consistent improvements over PCA, Kernel PCA, Supervised PCA and strong graph-based baselines such as UMAP, particularly in small sample size and high-curvature regimes. Our results position GTSA-PCA as a principled bridge between statistical and geometric approaches to dimensionality reduction.