SAMPLING FOR MODEL ‘8c349b2a7e185059d867bb3fe6b8746e’ NOW (CHAIN 1).
Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: lognormal_lpdf: Random variable[1] is -1.56634, but must be >= 0! (in ‘model120c70da6cea_8c349b2a7e185059d867bb3fe6b8746e’ at line 49)
Rejecting initial value:
Error evaluating the log probability at the initial value.
Exception: beta_lpdf: Second shape parameter[1] is 0, but must be > 0! (in ‘model120c70da6cea_8c349b2a7e185059d867bb3fe6b8746e’ at line 58)
I don’t really understand init or seed in the rstan::stan() so I don’t know what to change either to. Could you explain these two arguments or provide a reference that you feel explains them well.
init refers to the initial parameter values that Stan starts HMC/NUTS off with. If these values don’t produce a finite log posterior, then Stan has to choose new initial values until it gets a good starting point with a finite log posterior. If no initializer function or list is specified for rstan::stan(), then Stan will sample uniformly on the unconstrained support from the interval (-2, 2) using pseudo RNG. Pseudo RNG is deterministic in the sense that if you provide the same seed, you will get the same sample. So changing the seed will yeild a different set of initial values that may or may not have a finite log posterior. See pg 42 of the rstan manual for more details.
Another approach would be specifying a list or function that returns initial parameter values that you know will yield a finite log posterior. If you do this you should use different initial values for each Markov Chain to get more robust diagnostics.
I would do something like this, but with some RNG noise added to each chain initialization (I usually initialize by sampling from the parameter priors in R).
That’s great. Thanks @ScottAlder. EDIT: just tested. It reduced notably the number of Error evaluating the log probability at the initial value messages compared to the other approach. Thank you very much for this very helpful insight.
Just to make sure that I am doing it right - it might be that I am not doing it right.
The first one would ensure that the list of initial values and the number of chains matched, so probably that one is best. Otherwise, running on the default number of chains (4) won’t break it
Also, I should try to explain why it is that you should start each Markov chain at different initial values. Assume you have a multimodal posterior; its better to find out sooner rather than later that it is multimodal. There’s a chance that your Markov chains will settle down in separate modes, which will cause the R_hat statistic to be much greater than 1, which tells you that the posterior is multimodal. This will also result in a traceplot like this (source):
Starting all of your Markov chains at the same initial values makes it more likely that, if the posterior is multimodal, the chains will all settle down into the same mode and you won’t find out until you run your model for the n’th time and accidentally find a new mode. Dispersed initial values increase the chance of finding separate modes, if there are any.
That actually is very interesting and helpful because I was concerned that my response variable distribution was multimodal and was looking into how to test that.