Aitchison Embeddings for Learning Compositional Graph Representations

arXiv cs.LG / 5/4/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces a compositional graph embedding framework that models nodes as mixtures over latent archetypal factors rather than using hard-to-interpret embeddings.
It leverages Aitchison geometry (for comparing probability-like mixtures) and uses isometric log-ratio (ILR) coordinates so embeddings preserve Aitchison distances while allowing unconstrained optimization in Euclidean space.
The resulting embeddings are intrinsically interpretable because their geometry directly reflects trade-offs among archetypes, avoiding the need for post-hoc explanation methods.
Experiments on node classification and link prediction show competitive performance against strong baselines while also enabling coherent behavior under component restriction.
The method uses subcompositional coherence to support principled removal and renormalization of components, including dimensionality-removal analyses to study how archetype groups affect representations and predictions.

Abstract

Representation learning is central to graph machine learning, powering tasks such as link prediction and node classification. However, most graph embeddings are hard to interpret, offering limited insight into how learned features relate to graph structure. Many networks naturally admit a role-mixture view, where nodes are best described as mixtures over latent archetypal factors. Motivated by this structure, we propose a compositional graph embedding framework grounded in Aitchison geometry, the canonical geometry for comparing mixtures. Nodes are represented as simplex-valued compositions and embedded via isometric log-ratio (ILR) coordinates, which preserve Aitchison distances while enabling unconstrained optimization in Euclidean space. This yields intrinsically interpretable embeddings whose geometry reflects relative trade-offs among archetypes and supports coherent behavior under component restriction; we consider both fixed and learnable ILR bases. Across node classification and link prediction, our method achieves competitive performance with strong baselines while providing explainability by construction rather than post-hoc. Finally, subcompositional coherence enables principled component restriction: removing and renormalizing subsets preserves a well-defined geometry, which we exploit via subcompositional dimensionality removal to probe how archetype groups influence representations and predictions.