Jonah and I are putting the finishing touches on our beginner Stan workshop for StanCon Cambridge. I thought it would be nice to draw on the wisdom of the Stan community. What things do you wish past you had known when starting Stan? What top tips can you give for people just starting?
Maybe outside of the scope of your workshop and probably too simplistic, but learning how to use print statements inside the Stan code was a good way better understand what could be wrong and aid model building. For me at least.
That’s a great idea! We wanted to cover how to debug, and print statements are particularly useful for those more used to scripting in R. :) Thanks! :)
Jonah might not want to hear this, but shinystan lol. It’s way easier to figure out what all the different output options are by clicking around a webpage. And +1 to the print debugging. I don’t think I knew that when I started either and it’s super useful.
This advice would partly come from my experience starting with Stan and being active (or procrastinating) on the forum.
There are a couple of “contradictions” inherent in using Stan which trips up people with a certain background.
“Everything is a sample” and “Stan ~ statements don’t actually sample”.
With “everything is a sample” I mean that all parameters are represented by a distribution which is approximated by a sample of the distribution. No point estimates. Samples are really flexible to work with and you can investigate the distribution of complicated functions of the parameters. It basically allows you to setup a generative model and then use the samples to answer your question. It think this trips up people coming from a background like applied micro econ(ometrics) where you try to specify a model where one parameter answers your question and the rest are nuisance parameters. The confusing issue is that the tilde statements don’t actually sample from a distribution. As an R user I always have to remind myself that those statements are "
rnorm". I think this trips up people coming from BUGS and JAGS.
“Stan allows you to specify flexible models” and “Simple models take too long in Stan”.
The problem here is that beginners are either hitting computational problems or confront the folk theorem. Coming from a frequentist regression background, it’s confronting that you suddenly need to care about centering variables, the scale of parameters, different parameterizations, vectorizing, and precomputing values … Or that badly fitting models can take a long time.
“Stan allows you to specify flexible models” and “Stan enforces variable types”.
Stan is pretty unforgiving as a programming language when you are coming from
R. The trick to work with integers by pretending they are reals doesn’t work. Arrays are different from matrices and vectors which are different from each other.
Just to be clear: this is not by any means meant as a criticism of Stan. I don’t mean that Stan needs to change. I also know that these are not really contradictions but I think they sometimes look like it for beginners. Maybe making some of these apparent contradictions explicit could help introduce the right mental model of what Stan does.
I remember it taking me a long time to discern the purpose of the transformed parameters section, which I now understand is (mostly?) about whether you want a record of intermediate computations or not (i.e. if not, just define the variables in the model block).
Join the Slack channel mc-stan.slack.com! It could become a good place for some debugging support and airing very general or off-beat questions.
Something that took me a while to discover is the use of nested blocks when declaring work variables in the transformed parameters or the model blocks. Doing this, the variables declared in the inner block don’t appear in the final stanfit object, which makes them lighter and easier to manipulate.
How to make practical use of Stan with minimal statistical background/education to formulate problem into model language.
When I originally designed it, it was literally where we were going to put transformed parameters—functions of other parameters and data. It didn’t occur to me that people would want those and not want to save them. The original plan was to allow Jacobians to be updated with something like
jacobian__ = jacobian__ + ...;
Without that, you can’t actually implement Stan’s variable transforms within Stan. Stan turns off Jacobians for optimization and leaves them on for MCMC and VB.