Stan under the hood resources?

Hi devs,
I have been using stan for some hierarchical modeling on large datasets, and it has proved to be useful and powerful. I would like to know if there are resources out there for learning what goes on under the hood of stan. I’d like to know the general numerical procedure at a low/mid level simply so I can build intuition on troubleshooting. I have some very high level knowledge, but I’d like to be more learned. You guys have been very responsive and helpful, but it would be good to self-teach myself as I continue to use stan for more modeling work.

Another critical reason for seeking resources is that the models I develop in stan will eventually be submitted to the FDA and EMA for regulatory endorsement through different qualification pathways. In which case, I’ll need to be more fluent than I am now to address questions they might have.

Thanks,
Jackson

Hi Jackson,

If you search for Michael Betancourt’s arxiv work on HMC you’ll find several on NUTS/HMC and there is at least one that has a conceptual introduction as well as quite a few math-heavy articles.

K

https://arxiv.org/a/betancourt_m_1.html

You will probably want to start with the review, https://arxiv.org/abs/1701.02434.

Great thanks for the pieces of work. I’m sure that will keep me busy for quite some time. My background is in applied math, so I should be able to digest them to some reasonable degree.

There are various levels of what’s going on. Are you trying to understand Bayesian modeling, Markov chain Monte Carlo in general, Stan’s MCMC in particular, the low-level numerical details of the algorithms and derivatives, or what? Part of our process chapter on reproducibility was aimed at FDA-like regulatory concerns of bit-level reproducibility (which Stan maintains, all else being equal).

Radford Neal’s overview of HMC in the handbook is nice (and free online as a sample chapter); I also really liked the descriptions in McKay’s information theory book (also free online—go Cambridge University Press). The original NUTS paper is also worth reading, though a lot of the subtle details are lost in the weeds and almost all the analysis is in later work from Michael Betancourt.

If you go ahead with the FDA, you might want to reach out to some of the people involved in promoting Bayesian methods at the FDA like Frank Harrell, Jr. or Brad Carlin.

Hi Bob,
Thanks for the additional resource information. I can certainly look into those things. As for the FDA folk, yes those names will be good for the future. We work with the office of clinical pharmcology and the division of pharmacometrics quite a bit. Additionally the department of biostats, whom the folks you mentioned might be in that group.