Difference in parametrization


I have a question regarding parametrization. In particular, I tried two things, which yield slightly different results, and quite significant ones in terms of convergence, but I am unsure why. I guess the question boils down to the difference between:

beta ~ normal(0, sigma_beta);
y <- ... beta[ii];

and this:

beta ~ normal(0, 1);
y <- ... sigma_beta * beta[ii];

To me it looks like the two formulations of the same thing, namely applying the population scaling factor sigma beta to each individual beta. What am I missing?

1 Like

This is great stuff. Stan’s algorithms operate in the space defined by the parameters unconstrained parameters (probably log(sigma_beta) and beta). In the first case, the prior contribution of beta to the model’s log-density depends on the current value of sigma_beta and in the second case it does not! Models with less inter-dependence are easier to sample so you’ll see differences in convergence and, depending on the model/data differences in estimates.

Rather re-invent the explanations I’ll point you to the manuaul and literature: the terms you are looking for are “centered parameterization” / “non-centered parameterization” / “Matt trick” (what Stan people called it before they found the rest of the literature). The section of the manual you need is here:https://mc-stan.org/docs/2_18/stan-users-guide/reparameterization-section.html

With more difficult models these kinds of re-parameterizations (and there are lots of them!) are key to using Stan effectively.


Thanks a lot!

1 Like