Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator Fine-tuning

arXiv cs.CL / 3/25/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper introduces “Table-LLM-Specialist,” a self-trained fine-tuning approach aimed at improving language model performance on complex table tasks like NL-to-Code and data cleaning without costly human labels.
  • It leverages a generator–validator training-data strategy based on dual formulations of table tasks (generative vs. classification) to iteratively generate and validate synthetic training examples.
  • Experiments across Llama and OpenAI GPT models (GPT-3.5 and GPT-4) indicate that the method improves table-task quality, sometimes enabling GPT-3.5-based fine-tunes to reach or exceed GPT-4-level performance.
  • The approach is reported to reduce deployment cost and latency by allowing smaller models to achieve high quality, while also improving generalization through diverse, systematically generated data.
  • Microsoft states that the fine-tuned models have been integrated into Excel and deployed in production for automated table data cleaning, and the authors provide code via GitHub.

Abstract

Language models such as GPT and Llama have shown remarkable ability on diverse natural language tasks, yet their performance on complex table tasks (e.g., NL-to-Code and data cleaning) remains suboptimal. Improving performance typically requires task-specific fine-tuning, which depends on expensive human labeling and is prone to overfitting. In this work, we propose Table-LLM-Specialist, a self-trained fine-tuning paradigm designed for table tasks. Our key insight is that many table tasks admit two dual formulations: a generative version and a classification version. Leveraging this duality, we introduce a Generator-Validator paradigm that iteratively generates and validates training data using language models, enabling effective fine-tuning without manually labeled data. Extensive evaluations on Llama, GPT-3.5, and GPT-4 show that Table-LLM-Specialist achieves (1) strong performance across diverse tasks compared to base models, for example, models fine-tuned on GPT-3.5 often surpass GPT-4 level quality; (2) lower deployment cost by enabling smaller models to reach high quality with reduced latency and cost; and (3) better generalization across multiple benchmarks, due to training on diverse, systematically generated data from real-world tables. Our code is available at https://github.com/microsoft/Table-Specialist. Models fine-tuned with Table-LLM-Specialist have been integrated into Microsoft Excel and are deployed in production for automated table data cleaning.