TabEmb: Joint Semantic-Structure Embedding for Table Annotation
arXiv cs.LG / 4/22/2026
📰 NewsModels & Research
Key Points
- The paper addresses table annotation by noting that tables require representations that jointly capture each column’s semantics and the relationships between columns, unlike text where semantic embeddings alone often work.
- It argues that prior approaches that flatten 2D tables into 1D token sequences (then encode with pretrained language models like BERT) suffer from weaker semantic quality, poorer generalization to unseen/rare values, and degraded structural modeling.
- TabEmb improves this by decoupling semantic encoding from structural modeling: an LLM generates semantic embeddings per column, and a graph-based module models inter-column relationships to produce joint semantic-structural representations.
- Experiments indicate that TabEmb consistently outperforms strong baselines across multiple table annotation tasks, and the authors provide code and datasets.
- The work is positioned as a better alternative to 2D-to-1D flattening and context-length limitations by preserving structured interactions via a graph over columns.


