LAG-XAI: A Lie-Inspired Affine Geometric Framework for Interpretable Paraphrasing in Transformer Latent Spaces

arXiv cs.CL / 4/8/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes LAG-XAI, a Lie-inspired affine geometric framework that treats paraphrasing as a continuous affine transformation (geometric flow) in Transformer embedding/latent spaces rather than discrete word swaps.
  • It introduces a computationally efficient mean-field approximation inspired by local Lie group actions, decomposing paraphrase transitions into interpretable components: rotation, deformation, and translation.
  • Experiments on the PIT-2015 noisy Twitter corpus (Sentence-BERT embeddings) show a “linear transparency” effect, with the affine operator reaching AUC 0.7713 and retaining about 80% of a non-linear baseline’s effective classification capacity.
  • The method identifies geometric invariants such as a stable reconfiguration angle (~27.84°) and near-zero deformation (suggesting local isometry), and demonstrates cross-corpus generalization via validation on the TURL dataset.
  • As a practical application, LAG-XAI is used for LLM hallucination detection, achieving 95.3% factual distortion detection on HaluEval via a “cheap geometric check” for deviations beyond a semantic corridor.

Abstract

Modern Transformer-based language models achieve strong performance in natural language processing tasks, yet their latent semantic spaces remain largely uninterpretable black boxes. This paper introduces LAG-XAI (Lie Affine Geometry for Explainable AI), a novel geometric framework that models paraphrasing not as discrete word substitutions, but as a structured affine transformation within the embedding space. By conceptualizing paraphrasing as a continuous geometric flow on a semantic manifold, we propose a computationally efficient mean-field approximation, inspired by local Lie group actions. This allows us to decompose paraphrase transitions into geometrically interpretable components: rotation, deformation, and translation. Experiments on the noisy PIT-2015 Twitter corpus, encoded with Sentence-BERT, reveal a "linear transparency" phenomenon. The proposed affine operator achieves an AUC of 0.7713. By normalizing against random chance (AUC 0.5), the model captures approximately 80% of the non-linear baseline's effective classification capacity (AUC 0.8405), offering explicit parametric interpretability in exchange for a marginal drop in absolute accuracy. The model identifies fundamental geometric invariants, including a stable matrix reconfiguration angle (~27.84{\deg}) and near-zero deformation, indicating local isometry. Cross-domain generalization is confirmed via direct cross-corpus validation on an independent TURL dataset. Furthermore, the practical utility of LAG-XAI is demonstrated in LLM hallucination detection: using a "cheap geometric check," the model automatically detected 95.3% of factual distortions on the HaluEval dataset by registering deviations beyond the permissible semantic corridor. This approach provides a mathematically grounded, resource-efficient path toward the mechanistic interpretability of Transformers.