Bayesian random forest

This and other Wilson’s papers are some of those I meant. In addition I was specifically thinking this one [1809.06452] Robustness Guarantees for Bayesian Inference with Gaussian Processes

Yes, this is true. What about Neal’s work on BNNs? What about Edward? TorchBNN? There should be some utility there?

Fundamentally, posterior looks like depends on the model. I am lost…

Thanks for the references on feature generation.

Of course there is some utility, I never claimed otherwise. Let me try again. Except for some trivial cases (like NN equivalent to linear or logistic regression) no-one knows how to integrate over the posterior of NN in finite time with controlled integration error. Using MCMC or VI for NN can improve predictive performance compared to, e.g., optimization, even if they are not doing accurate posterior approximations. Thus there is some utility and I’m also fine sometimes using machine learning to get useful predictions, but then we need to be aware of two potential problems: 1) it’s more difficult to know what would happen if we would use more computation time (early stop is also known to be beneficial in machine learning) and 2) it’s impossible to separate the actual model and prior from the implicit prior produced by the biased integration. BNN with MCMC and VI are just like other machine learning algorithms, ie they get inspiration from other fields, but in the end the utility is measured by repeated experiments with different training and test data sets.

4 Likes

What about ARD claimed by Neal? Feature importance could be established by random forest but based on this thread Bayesian Random Forest is even less tractable than BNN. What things one has to keep in mind when making decisions that feature is not important because 95% CR of weight estimate (from input layer to the first hidden layer) is close to 0?

I’m not sure Savage’s comparison between random forests and logistic regression is entirely fair. The random forest was not shown activities with missing user IDs (10% of the data), whereas the logistic regression was given the full data-set. If the random forest was one that also handles missing-data, I wonder how different the prediction accuracy would be.