AI Navigate

ZTab: Domain-based Zero-shot Annotation for Table Columns

arXiv cs.LG / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • ZTab proposes a domain-based zero-shot framework to automatically annotate semantic column types in relational tables without requiring user-provided labeled data, addressing privacy concerns and labeling costs.
  • It generates pseudo-tables from sample schemas and fine-tunes an annotation LLM on them to enable domain-aware zero-shot labeling.
  • The domain configuration offers a trade-off between zero-shot breadth and annotation performance, with a universal domain approaching pure zero-shot and a specialized domain achieving better accuracy within a given application.
  • The approach aims to reduce reliance on high-performance closed-source LLMs, enables test-time operation without retraining for similar domains, and provides code and datasets on GitHub for reproducibility.

Abstract

This study addresses the challenge of automatically detecting semantic column types in relational tables, a key task in many real-world applications. Zero-shot modeling eliminates the need for user-provided labeled training data, making it ideal for scenarios where data collection is costly or restricted due to privacy concerns. However, existing zero-shot models suffer from poor performance when the number of semantic column types is large, limited understanding of tabular structure, and privacy risks arising from dependence on high-performance closed-source LLMs. We introduce ZTab, a domain-based zero-shot framework that addresses both performance and zero-shot requirements. Given a domain configuration consisting of a set of predefined semantic types and sample table schemas, ZTab generates pseudo-tables for the sample schemas and fine-tunes an annotation LLM on them. ZTab is domain-based zero-shot in that it does not depend on user-specific labeled training data; therefore, no retraining is needed for a test table from a similar domain. We describe three cases of domain-based zero-shot. The domain configuration of ZTab provides a trade-off between the extent of zero-shot and annotation performance: a "universal domain" that contains all semantic types approaches "pure" zero-shot, while a "specialized domain" that contains semantic types for a specific application enables better zero-shot performance within that domain. Source code and datasets are available at https://github.com/hoseinzadeehsan/ZTab