Towards Platonic Representation for Table Reasoning: A Foundation for Permutation-Invariant Retrieval

arXiv cs.AI / 4/15/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper argues that representing tables by linearizing them (as in many NLP pipelines) destroys key geometric and relational structure, making models brittle to layout permutations.
It introduces the Platonic Representation Hypothesis (PRH), claiming that latent spaces for table reasoning should be intrinsically permutation-invariant to remain semantically stable.
The authors propose formal diagnostics for “serialization bias,” including two metrics derived from Centered Kernel Alignment (CKA) to measure embedding drift under structural derangement and convergence toward a canonical latent structure.
Empirical results suggest a vulnerability in modern LLM-based approaches: even small table layout changes can cause large, disproportionate shifts in table embeddings, which can undermine RAG systems by making retrieval sensitive to layout noise rather than semantics.
To address this, the paper presents a structure-aware table representation learning (TRL) encoder that enforces cell header alignment, improving geometric stability and moving toward permutation-invariant retrieval.

Abstract

Historical approaches to Table Representation Learning (TRL) have largely adopted the sequential paradigms of Natural Language Processing (NLP). We argue that this linearization of tables discards their essential geometric and relational structure, creating representations that are brittle to layout permutations. This paper introduces the Platonic Representation Hypothesis (PRH) for tables, positing that a semantically robust latent space for table reasoning must be intrinsically Permutation Invariant (PI). To ground this hypothesis, we first conduct a retrospective analysis of table-reasoning tasks, highlighting the pervasive serialization bias that compromises structural integrity. We then propose a formal framework to diagnose this bias, introducing two principled metrics based on Centered Kernel Alignment (CKA): (i) PI, which measures embedding drift under complete structural derangement, and (ii) rho, a Spearman-based metric that tracks the convergence of latent structures toward a canonical form as structural information is incrementally restored. Our empirical analysis quantifies an expected flaw in modern Large Language Models (LLMs): even minor layout permutations induce significant, disproportionate semantic shifts in their table embeddings. This exposes a fundamental vulnerability in RAG systems, in which table retrieval becomes fragile to layout-dependent noise rather than to semantic content. In response, we present a novel, structure-aware TRL encoder architecture that explicitly enforces the cognitive principle of cell header alignment. This model demonstrates superior geometric stability and moves towards the PI ideal. Our work provides both a foundational critique of linearized table encoders and the theoretical scaffolding for semantically stable, permutation invariant retrieval, charting a new direction for table reasoning in information systems.