Prior recommendation for scale parameters in hierarchical models too strong?

There are two things that can be very misleading here.

  1. Posterior probability mass is what matters, not density. Mass is density times volume (really integrated over the volume). Usually you don’t sample anywhere near the high density areas in high dimensions. For more intuition building exercises, see my case study:

http://mc-stan.org/users/documentation/case-studies/curse-dims.html

Now in the case of the half normal, the range [0, 0.5] is higher probability than the range [0.5, 1.0], so that’s not the issue here. But what is the issue is something like Pr[sigma < 1] winds up being pretty similar to Pr[sigma > 1] with a half standard normal. This is the kind of thing we want to concentrate on, not where the mode is.

  1. Posterior means get shifted by truncation. So even though the mode is at zero for a half-normal, the mean is pushed to the right. In general, when you truncate on the left, mass gets redistributed and means shift to the right. That’s how the mean of a half-normal(0, 1) is near 1. This is why truncated interval priors can be so biased compared to their untruncated sources.

What we recommend instead is scaling the parameters and using the default priors. If you can’t scale the parameters, then you definitely need to scale the priors.

As the value approaches zero, so does the density:

lim_{s -> 0} gamma(s | alpha, beta) = 0.

Lognormal has the same property. Andrew’s papers cover the rate at which the density approaches zero and what effect that has on the posterior.

3 Likes