Hi,
I have a question regarding parametrization. In particular, I tried two things, which yield slightly different results, and quite significant ones in terms of convergence, but I am unsure why. I guess the question boils down to the difference between:
this
beta ~ normal(0, sigma_beta);
y <- ... beta[ii];
and this:
beta ~ normal(0, 1);
y <- ... sigma_beta * beta[ii];
To me it looks like the two formulations of the same thing, namely applying the population scaling factor sigma beta to each individual beta. What am I missing?
1 Like
This is great stuff. Stan’s algorithms operate in the space defined by the parameters unconstrained parameters (probably log(sigma_beta
) and beta
). In the first case, the prior contribution of beta
to the model’s log-density depends on the current value of sigma_beta
and in the second case it does not! Models with less inter-dependence are easier to sample so you’ll see differences in convergence and, depending on the model/data differences in estimates.
Rather re-invent the explanations I’ll point you to the manuaul and literature: the terms you are looking for are “centered parameterization” / “non-centered parameterization” / “Matt trick” (what Stan people called it before they found the rest of the literature). The section of the manual you need is here:https://mc-stan.org/docs/2_18/stan-users-guide/reparameterization-section.html
With more difficult models these kinds of re-parameterizations (and there are lots of them!) are key to using Stan effectively.
3 Likes