Geometric Stability: The Missing Axis of Representations

arXiv stat.ML / 4/21/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces “geometric stability,” a new way to evaluate how robust a neural representation’s pairwise distance structure is under perturbations, rather than only how similar two representations are.
  • The proposed metric, Shesha, uses split-half correlations of representational dissimilarity matrices from complementary feature subsets, and it is intentionally not invariant to orthogonal transformations—allowing it to detect compression-driven damage that similarity metrics miss.
  • Spectral analysis suggests similarity measures fail after removing the top principal component, while geometric stability remains informative across the full eigenspectrum.
  • Experiments across 2,463 encoder configurations and 7 domains show stability and similarity are essentially uncorrelated (ρ ≈ -0.01), and a “geometric tax” is observed where DINOv2 ranks last in geometric stability on 5 of 6 datasets.
  • The authors report that contrastive alignment and hierarchical architectures are associated with higher stability, offering guidance for selecting models when representational reliability is important for deployment.

Abstract

Representational similarity analysis and related methods have become standard tools for comparing the internal geometries of neural networks and biological systems. These methods measure what is represented, the alignment between two representational spaces, but not whether that structure is robust. We introduce geometric stability, a distinct dimension of representational quality that quantifies how reliably a representation's pairwise distance structure holds under perturbation. Our metric, Shesha, measures self-consistency through split-half correlation of representational dissimilarity matrices constructed from complementary feature subsets. A key formal property distinguishes stability from similarity: Shesha is not invariant to orthogonal transformations of the feature space, unlike CKA and Procrustes, enabling it to detect compression-induced damage to manifold structure that similarity metrics cannot see. Spectral analysis reveals the mechanism: similarity metrics collapse after removing the top principal component, while stability retains sensitivity across the eigenspectrum. Across 2463 encoder configurations in seven domains -- language, vision, audio, video, protein sequences, molecular profiles, and neural population recordings -- stability and similarity are empirically uncorrelated (\rho=-0.01). A regime analysis shows this independence arises from opposing effects: geometry-preserving transformations make the metrics redundant, while compression makes them anti-correlated, canceling in aggregate. Applied to 94 pretrained models across 6 datasets, stability exposes a "geometric tax": DINOv2, the top-performing model for transfer learning, ranks last in geometric stability on 5/6 datasets. Contrastive alignment and hierarchical architecture predict stability, providing actionable guidance for model selection in deployment contexts where representational reliability matters.