I have a question regarding parametrization. In particular, I tried two things, which yield slightly different results, and quite significant ones in terms of convergence, but I am unsure why. I guess the question boils down to the difference between:
beta ~ normal(0, sigma_beta);
y <- ... beta[ii];
beta ~ normal(0, 1);
y <- ... sigma_beta * beta[ii];
To me it looks like the two formulations of the same thing, namely applying the population scaling factor sigma beta to each individual beta. What am I missing?
This is great stuff. Stan’s algorithms operate in the space defined by the parameters unconstrained parameters (probably
beta). In the first case, the prior contribution of
beta to the model’s log-density depends on the current value of
sigma_beta and in the second case it does not! Models with less inter-dependence are easier to sample so you’ll see differences in convergence and, depending on the model/data differences in estimates.
Rather re-invent the explanations I’ll point you to the manuaul and literature: the terms you are looking for are “centered parameterization” / “non-centered parameterization” / “Matt trick” (what Stan people called it before they found the rest of the literature). The section of the manual you need is here:https://mc-stan.org/docs/2_18/stan-users-guide/reparameterization-section.html
With more difficult models these kinds of re-parameterizations (and there are lots of them!) are key to using Stan effectively.