Benchmarking Tabular Foundation Models for Conditional Density Estimation in Regression
arXiv cs.LG / 3/30/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper benchmarks tabular foundation model variants (e.g., TabPFN/TabICL-style models) for conditional density estimation (CDE) in regression, focusing on recovering full predictive distributions rather than point estimates.
- Evaluations on 39 real-world tabular datasets across training sizes (50 to 20,000) and multiple baselines show foundation models generally achieve the best density accuracy, log-likelihood, and CRPS, indicating strong off-the-shelf CDE performance.
- Calibration is competitive at small sample sizes, but for certain datasets/metrics it can lag specialized neural CDE baselines as data size grows, implying that post-hoc recalibration may improve reliability.
- In an SDSS DR18 photometric redshift case study, a TabPFN variant trained on 50,000 galaxies outperforms baselines trained on the full 500,000-galaxy dataset, suggesting sample-efficiency benefits.
- The results position tabular foundation models as effective general-purpose conditional density estimators, filling a gap where CDE performance had not been systematically assessed compared with point prediction.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer
Simon Willison's Blog
Beyond the Chatbot: Engineering Multi-Agent Ecosystems in 2026
Dev.to

I missed the "fun" part in software development
Dev.to

The Billion Dollar Tax on AI Agents
Dev.to

Hermes Agent: A Self-Improving AI Agent That Runs Anywhere
Dev.to