I’m following up on this discussion on the old stan-users mailing list regarding non centered parameterization of variance parameters:
Basically Julian found that using non-centered parameterization on on the variance parameter caused subtle differences in the parameter estimates. It seemed like it was never resolved, and I came on the thread when I ran into the same thing.
It seems to me that the centered and non centered versions are actually different models, and it seems like this would be an issue beyond Justin’s example. Using Justin’s example:
t looks like (up to additive constants)
log(p(hp | dat1, dat2)) = -nlog(sig1) - dat1^2/2sig1^2 -nlog(sig2) - dat2^2/2sig2^2 -2 log(hp)+log(sig1)-hpsig1 -2log(hp)+log(sig2)-hpsig2
log(p(hp_alt | dat1, dat2)) = -nlog(c1_althp_alt) - ndat1^2/2(c1_althp_alt)^2 -log(c2_althp_alt) - dat2^2/2(c2_althp_alt)^2 +log(c1_alt)-c1_alt +log(c2_alt)-c2_alt
log(p(hp_alt | dat1, dat2)) = -dat1^2/2(c1_althp_alt)^2 - dat2^2/2(c2_althp_alt)^2 -2log(hp_alt)-c1_alt -c2_alt
so if I did my math right, log(p(hp | dat1, dat2)) and log(p(hp_alt | dat1, dat2)) will differ by -n*(log(sig1)+log(sig2))
So, I believe the reparamaterization is equivalent to imposing the gamma(2+n,epsilon) prior from Chung 2015 on the original model?
It also seems like this would be the case with normal distributions as well? My understanding from the mailing list was that the ‘Matt Trick’ was just a computational tool to speed up convergence, but working through this example it seems like it is implicitly putting a weakly informative prior on the data as well.