Stan wrapp (or reimplement) R. Neal’s
C software for Flexible Bayesian Modeling and Markov Chain Sampling? I want to to use his code (or an equivalent one) from
Sorry for not supplying more context, we used his code in a previous nuclear physics publication, and we were hoping we can switch to Stan for the next one. What do you recommend instead?
which was co-written by Matt Hoffman who developed the original version of NUTS for Stan. You can do neural network stuff in Stan, but the posterior tends to be multimodal and it is hard to do full Bayesian inference with the resulting draws. There is more discussion along these lines at
Thanks for the link and info, will check it in more detail tomorrow!
By the way, Edward and PyMC3 also don’t have support for this right?
You can write down the posterior PDF of a neural network in almost any language. Sampling from the posterior distribution is the hard part.
Radford Neal’s FBM is alternating sampling hyperparameters with Gibbs sampling and sampling weights with HMC, so you can’t use Stan to replicate that exactly. Matt Hoffman and some other people have used Stan to sample both hyperparameters and weights with NUTS variant of HMC. Since the sampling from Bayesian neural network posterior is really hard and it’s really hard to know whether you are sampling from the whole posterior (ignoring multiple modes to the label switching and aliasing) it is likely that FBM, Stan, Edward, PyMC3 and any other software will fail to sample correctly from the whole posterior. The sampling may produce useful predictive models, and then the question is are you happy with results where you don’t know what is the effect of model and priors and what is the effect of the algorithm (you could say then that it’s machine learning).
Isn’t that the other way around in that they’re using neural nets to fit the curvature in the posterior rather than doing sampling to fit neural nets? I know they want to do the latter, too, so maybe it’s both.
Yes, you can use that algorithm in PyMC3. We don’t know how well it works, because HMC is so sensitive to tuning, and Gibbs changes the posterior.
Until someone comes up with a P = NP reduction, I don’t expect to see anyone fully explore the posterior of a neural network!
Yup. Not that there’s anything wrong with that. I want self-driving cars and better product recommendations, too.