GSVD for Geometry-Grounded Dataset Comparison: An Alignment Angle Is All You Need

arXiv cs.LG / 3/12/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The authors propose a geometry-grounded framework to compare two datasets by modeling linear relations with the co-span constraint Ax = By = z and using the generalized SVD (GSVD) to create a shared coordinate system for the subspaces.
They factor the data as A = HCU and B = HSV with C^T C + S^T S = I, which separates shared versus dataset-specific directions through the (C, S) diagonal structure.
A per-sample interpretable angle score theta(z) in [0, π/2] is derived to quantify whether a sample is explained more by A, more by B, or similarly by both.
The approach is demonstrated on MNIST to illustrate angle distributions and directions, and a binary classifier based on theta(z) is presented as a practical diagnostic tool.

Abstract

Geometry-grounded learning asks models to respect structure in the problem domain rather than treating observations as arbitrary vectors. Motivated by this view, we revisit a classical but underused primitive for comparing datasets: linear relations between two data matrices, expressed via the co-span constraint

Ax = By = z

in a shared ambient space. To operationalize this comparison, we use the generalized singular value decomposition (GSVD) as a joint coordinate system for two subspaces. In particular, we exploit the GSVD form

A = HCU

B = HSV

with

C^{\top}C + S^{\top}S = I

, which separates shared versus dataset-specific directions through the diagonal structure of

(C, S)

. From these factors we derive an interpretable *angle score*

\theta(z) \in [0, \pi/2]

for a sample

z

, quantifying whether z is explained relatively more by

A

, more by

B

, or comparably by both. The primary role of

\theta(z)

is as a *per-sample geometric diagnostic*. We illustrate the behavior of the score on MNIST through angle distributions and representative GSVD directions. A binary classifier derived from

\theta(z)

is presented as an illustrative application of the score as an interpretable diagnostic tool.

Astral to Join OpenAI

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

Why Data is Important for LLM

Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

Dev.to

GSVD for Geometry-Grounded Dataset Comparison: An Alignment Angle Is All You Need

Key Points

Abstract

Related Articles

Astral to Join OpenAI

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Why Data is Important for LLM

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer