Bayesian Additive Regression Trees (BART)?

This question might be silly, but I thought I should ask anyway. Is it possible to implement BART with stan? If it is possible, has someone done it or plan to do it? If it is not possible, could someone give me a high-level explanation of why?


Regression trees are discontinuous and any variant of HMC requires the posterior density to be differentiable with respect to the unknown parameters.

1 Like

If your python-fu is sharp and you’re really willing to use HMC-type algorithms to fit your models you may try Discontinuous HMC, an implementation of which lives here. Of course, @betanalpha may have his own geometric qualms with that, but it seems promising to me.

1 Like

I imagine you could come up with some kind of continuous-density variant, but sounds like a dissertation-scale project.

You might want to look at the github repository linked in this post!

EDIT: They illustrate BART in the “Case study” subdirectory.


Like many other non-parametric models, nothing can do full Bayesian inference for BART—it’s combinatorially intractable. The best that’s likely to happen is that you’ll find a local mode or even a few thousand and noodel around there.

Like neural networks (this decade’s replacement for additive regression trees) or latent Dirichlet allocation, which also can’t be fit with full Bayes, you might still find BART useful in applications. Just make sure to call it “approximate Bayes” (MCMC isn’t approximate in the same sense—with enough computation, you can almost surely get any precision you want out of MCMC).

Did you have something concrete in mind? The priors are over things like discrete tree structures and are too intractable to marginalize. The whole point of these tree things is to divide the space up into ever finer discrete subregions then use branching to decide which one to use for a given tree, then sum a bunch together.

That’s the Balloon Analogue Risk Task, not Bayesian Additive Regression Trees!

1 Like

Ahah, sorry! Fooled by an acronym!