Initial value different from specified initial value

Hi everyone,

I am trying to understand the ‘initial values’ in one of my models. I specify some initial values, and they are recognized by Stan, as can be seen in the first screenshot:

Now, when I look into the sampled values, I can see they don’t perfectly match my initial values. From what I understand (and read elsewhere), I should think of the initial values as ‘timestep zero’, and the first values in my samples are then ‘timestep one’, meaning after NUTS has wandered around a bit in parameter space. This makes sense to me and explains why values are near, but not identical to, my specified initial values. However, for the two parameters ‘pdd1_pop’ and ‘dtopt_vpop’, the values are very different, and 34 for pdd1_pop and 19 for dtopt_vpop are further from where the chains eventually settle than my specified initial values:

Both these parameters have a lower bound of 0, but many of the other parameters have that as well. It is unintuitive to me why some parameters would change so drastically while others don’t change much at all. Also, I don’t fully understand the warm-up phase in NUTS, so maybe this behaviour is expected.

Any hint or explanation would be greatly appreciated.

Cheers,

Friedrich

If the gradient with respect to those parameters is very steep at your initial point, this could explain it. You can check using grad_log_prob. See log_prob and grad_log_prob functions — log_prob-methods • rstan

Hi Jacob,

Thanks for your reply. Here are the gradients at the initial values:

The first one is pdd1_pop, and the 5th one is dtopt_vpop. So yes, they are the parameters with the steepest gradients. And since the one for pdd1 is negative and for dtopt is positive, we can expect our first sampled value for pdd1 to be smaller than the initial value, and for dtopt, we would expect a value larger than the initial value?

That would be the first-order expectation, yes. It’s possible that the gradients for these two parameters twist through some odd shape that flips the sign (while the remainder of the parameters wander around in a way that prevents the no-U-turn criterion from being triggered), but I think this would not be very likely.