Multivariate Uncertainty Quantification with Tomographic Quantile Forests

arXiv stat.ML / 4/2/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces Tomographic Quantile Forests (TQF), a nonparametric, uncertainty-aware tree-based model for multivariate predictive distributions.
  • TQF learns conditional quantiles of directional projections of the target (nᵀy) as functions of inputs x and unit directions n, enabling reconstruction of full multivariate conditional distributions at inference.
  • It aggregates across many directions and reconstructs the distribution by minimizing sliced Wasserstein distance using an efficient alternating optimization with convex subproblems.
  • The approach avoids limitations of prior directional-quantile methods by using a single model that covers all directions, without enforcing convex quantile-region constraints.
  • The authors evaluate TQF on both synthetic and real-world datasets and provide released GitHub source code for reproducibility.

Abstract

Quantifying predictive uncertainty is essential for safe and trustworthy real-world AI deployment. Yet, fully nonparametric estimation of conditional distributions remains challenging for multivariate targets. We propose Tomographic Quantile Forests (TQF), a nonparametric, uncertainty-aware, tree-based regression model for multivariate targets. TQF learns conditional quantiles of directional projections \mathbf{n}^{\top}\mathbf{y} as functions of the input \mathbf{x} and the unit direction \mathbf{n}. At inference, it aggregates quantiles across many directions and reconstructs the multivariate conditional distribution by minimizing the sliced Wasserstein distance via an efficient alternating scheme with convex subproblems. Unlike classical directional-quantile approaches that typically produce only convex quantile regions and require training separate models for different directions, TQF covers all directions with a single model without imposing convexity restrictions. We evaluate TQF on synthetic and real-world datasets, and release the source code on GitHub.