Generalized Bayesian Additive Regression Trees: Theory and Software

arXiv stat.ML / 2026/3/24

💬 オピニオンIdeas & Deep AnalysisTools & Practical UsageModels & Research

要点

  • The paper presents a generalized Bayesian framework for Bayesian Additive Regression Trees (BART), modeling responses drawn from exponential family distributions rather than only continuous/binary outcomes.
  • It derives sufficient conditions under which the posterior distribution concentrates at a minimax rate (up to a logarithmic factor), providing theoretical support for the empirical effectiveness of BART variants.
  • The framework unifies multiple existing extensions of BART by covering many response types, including categorical/count-style data through appropriate exponential-family choices.
  • To enable practical adoption, the authors release a Python package (also usable from R via reticulate) implementing GBART for several exponential-family distributions such as Poisson, Inverse Gaussian, and Gamma, in addition to standard regression/classification.
  • The software’s user-friendly interface is intended to make it straightforward for practitioners to fit BART models across a broader range of statistical response settings.

Abstract

Bayesian Additive Regression Trees (BART) are a powerful ensemble learning technique for modeling nonlinear regression functions. Although initially BART was proposed for predicting only continuous and binary response variables, over the years multiple extensions have emerged that are suitable for estimating a wider class of response variables (e.g. categorical and count data) in a multitude of application areas. In this paper we describe a generalized framework for Bayesian trees and their additive ensembles where the response variable comes from an exponential family distribution and hence encompasses many prominent variants of BART. We derive sufficient conditions on the response distribution, under which the posterior concentrates at a minimax rate, up to a logarithmic factor. In this regard our results provide theoretical justification for the empirical success of BART and its variants. To support practitioners, we develop a Python package, also accessible in R via reticulate, that implements GBART for a range of exponential family response variables including Poisson, Inverse Gaussian, and Gamma distributions, alongside the standard continuous regression and binary classification settings. The package provides a user-friendly interface, enabling straightforward implementation of BART models across a broad class of response distributions.

Generalized Bayesian Additive Regression Trees: Theory and Software | AI Navigate