SYNTHONY: A Stress-Aware, Intent-Conditioned Agent for Deep Tabular Generative Models Selection

arXiv cs.LG / 4/2/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces SYNTHONY, a stress-aware, intent-conditioned framework that selects the best deep generative synthesizer family for tabular data based on dataset “stressors” and a user’s metric preferences.
  • It proposes “stress profiling,” using synthesis-specific meta-features across four interpretable stress dimensions (e.g., long-tailed marginals, high-cardinality categoricals, Zipfian imbalance, and small-sample regimes) to predict which model family will perform best.
  • In experiments on 7 datasets, 10 synthesizer families, and 3 intents, a kNN selector over stress meta-features achieves strong Top-1 selection accuracy, outperforming zero-shot LLM-based selection and random baselines.
  • The authors show that the main limitation comes from a hand-crafted capability registry used to calibrate synthesizer strengths, and they argue learned capability representations could close the performance gap.

Abstract

Deep generative models for tabular data (GANs, diffusion models, and LLM-based generators) exhibit highly non-uniform behavior across datasets; the best-performing synthesizer family depends strongly on distributional stressors such as long-tailed marginals, high-cardinality categorical, Zipfian imbalance, and small-sample regimes. This brittleness makes practical deployment challenging, especially when users must balance competing objectives of fidelity, privacy, and utility. We study {intent-conditioned tabular synthesis selection}: given a dataset and a user intent expressed as a preference over evaluation metrics, the goal is to select a synthesizer that minimizes regret relative to an intent-specific oracle. We propose {stress profiling}, a synthesis-specific meta-feature representation that quantifies dataset difficulty along four interpretable stress dimensions, and integrate it into {SYNTHONY}, a selection framework that matches stress profiles against a calibrated capability registry of synthesizer families. Across a benchmark of 7 datasets, 10 synthesizers, and 3 intents, we demonstrate that stress-based meta-features are highly predictive of synthesizer performance: a kNN selector using these features achieves strong Top-1 selection accuracy, substantially outperforming zero-shot LLM selectors and random baselines. We analyze the gap between meta-feature-based and capability-based selection, identifying the hand-crafted capability registry as the primary bottleneck and motivating learned capability representations as a direction for future work.