int J; // number of schools
vector[J] y; // estimated treatment effects
vector<lower=0>[J] sigma; // s.e.’s of effect estimates
theta ~ normal(mu, tau);
y ~ normal(theta, sigma);
tau ~ gamma(2, 2./50.); # alpha=2, expectation = 50
I ran this on the 8 schools and, to be honest, it didn’t remove all the geometry problems but I think it’s a lot better than the centered parameterization without the prior. And, yes, you could just use the non-centered parameterization here, but the point here is to try to avoid having to do that, given that the ncp can have its own issues (for example, when data are rich).
Another quick proposal:
Classical results say that posterior is robust/insensitive to hyper-prior, but sometimes Zap will bias the posterior if there is indeed a posterior mass around 0. As Dan wrote in blog post:
It gives you a massively different set of values
In light of simulated-tempering, we can use importance sampling to adjust that bias, except we don’t have any sample around 0.
So how about averaging two models: tau =0 and tau ~ gamma (2, 2/50). Since we have already include tau=0 in the first situation, I might even suggest a more informative Zap, say inv-gamma(5,5).
Then inv-gamma(5000,5000) I guess.
A naive implementation of tempering, or importance sampling, or indeed BMA (they should be equvalent in this context), will fail because of the non-overlaped posterior energy spreads of these two models (which I guess can be an alternative definition of funnel?). But we also have stacking in the tool box.
To sum-up: adjust bias of Zap by tempering; replace importance sampling with stacking.
Yes, I like the idea of zap plus model averaging. This is in the spirit of my blog post where I suggested that we consider the funnel as a sort of discontinuous or multimodal posterior–it’s not actually multimodal, it doesn’t have 2 different modes, but it does have 2 different zones of curvature–and making the discreteness overt, as it were. The only tricky thing will come when we have many group-level variance parameters, as this gives us a mixture over 2^K modes.