AI Navigate

GSVD for Geometry-Grounded Dataset Comparison: An Alignment Angle Is All You Need

arXiv cs.LG / 3/12/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The authors propose a geometry-grounded framework to compare two datasets by modeling linear relations with the co-span constraint Ax = By = z and using the generalized SVD (GSVD) to create a shared coordinate system for the subspaces.
  • They factor the data as A = HCU and B = HSV with C^T C + S^T S = I, which separates shared versus dataset-specific directions through the (C, S) diagonal structure.
  • A per-sample interpretable angle score theta(z) in [0, π/2] is derived to quantify whether a sample is explained more by A, more by B, or similarly by both.
  • The approach is demonstrated on MNIST to illustrate angle distributions and directions, and a binary classifier based on theta(z) is presented as a practical diagnostic tool.

Abstract

Geometry-grounded learning asks models to respect structure in the problem domain rather than treating observations as arbitrary vectors. Motivated by this view, we revisit a classical but underused primitive for comparing datasets: linear relations between two data matrices, expressed via the co-span constraint Ax = By = z in a shared ambient space. To operationalize this comparison, we use the generalized singular value decomposition (GSVD) as a joint coordinate system for two subspaces. In particular, we exploit the GSVD form A = HCU, B = HSV with C^{\top}C + S^{\top}S = I, which separates shared versus dataset-specific directions through the diagonal structure of (C, S). From these factors we derive an interpretable *angle score* \theta(z) \in [0, \pi/2] for a sample z, quantifying whether z is explained relatively more by A, more by B, or comparably by both. The primary role of \theta(z) is as a *per-sample geometric diagnostic*. We illustrate the behavior of the score on MNIST through angle distributions and representative GSVD directions. A binary classifier derived from \theta(z) is presented as an illustrative application of the score as an interpretable diagnostic tool.