Prior-Aligned Data Cleaning for Tabular Foundation Models

arXiv cs.LG / 4/29/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • Tabular Foundation Models (TFMs) perform well via meta-learning on synthetic data, but real-world issues like missing values, outliers, and duplicates create a “prior mismatch” that hurts both accuracy and confidence calibration.
  • The paper introduces L2C2, a deep reinforcement learning framework that treats tabular data cleaning as prior alignment by learning a policy to sequentially apply cleaning operators and minimize the distribution gap to the TFM’s synthetic prior.
  • Experiments on 10 OpenML datasets show that reward design is challenging (several reward formulations lead to degenerate trivial cleaning strategies), while the proposed TFMAwareReward improves TFM accuracy on structurally diverging pipelines without underperforming.
  • Parameterized cleaning actions yield better pipeline rewards on 9/10 datasets, and a policy pre-trained on one dataset transfers effectively—outperforming scratch training at an early fine-tuning checkpoint and up to +28.8% after full fine-tuning.
  • Overall, the results position prior-aligned sequential cleaning as a principled data preparation approach for deploying TFMs on messy real-world tabular data.

Abstract

Tabular Foundation Models (TFMs) achieve state-of-the-art zero-shot accuracy on small tabular datasets by meta-learning over synthetic data-generating processes -- making them highly attractive for practitioners who cannot afford large annotated corpora. However, their in-context learning mechanism assumes approximately clean inputs: missing values, outliers, and duplicates in the real-world data create a prior mismatch that degrades both accuracy and confidence calibration simultaneously. Correcting this mismatch requires sequential decisions over cleaning operators whose interactions no static preprocessing rule can anticipate -a natural fit for reinforcement learning~(RL). We introduce L2C2, the first deep RL framework framing tabular data cleaning as prior alignment: a learned policy sequences operators to minimize the distributional gap between dirty input and the TFM's synthetic prior. Six experiments on ten OpenML benchmark datasets establish: 1) three of seven reward designs collapse to degenerate trivial cleaning strategies -- principled reward engineering is scientifically non-trivial; 2) the novel TFMAwareReward reward we propose selects structurally distinct pipelines on 4/10 datasets and achieves higher TabPFN accuracy on those diverging cases (mean 0.851 vs. 0.843; Wilcoxon p=0.063, n=4) while never underperforming; 3) parameterized cleaning actions improve best-found pipeline reward on 9/10 datasets (Wilcoxon p=0.004); and 4) a policy pre-trained on one single source dataset exceeds scratch training at the 2,000-step fine-tuning checkpoint on all three held-out datasets (up to +28.8% after full fine-tuning) demonstrating cross-dataset transfer of prior-alignment knowledge. These findings establish that prior alignment is a principled data preparation strategy for TFM deployment on real-world tabular data.