Types of hyperpriors in Stan

It seems you have similar problems to mine. I am also working regularly with hierarchical models that need constrained parameters and have struggled to find good recommendations on this topics or methods that work well (see also my question on this issue). Maybe we could share some experiences on this topic here.

Like you, I have also tried an exponential hierarchical prior on the individual parameters as my first approach, but I have quickly moved away from this approach. As far as I could see from my experiments, the exponential prior has the drawback, that it only limits the individual parameters from one side (i.e. it makes higher values unlikely) but not from the other. So group level information is only propagated if it indicates that there is some kind of upper bound on the actual individual values. In addition, if rate_beta becomes very high, indicating some kind of pooling, then this simultaneously can increase the prio likelihood for very low values, which lead to distortions in some of my models. For example, I had models where the individual parameters seem to be in the range (95% HDI) of [1,2], then this seems to be hard to capture using an exponential distribution. This can be understood as the HDI for the exponential must always be of the kind [0,X]. Also, I found that the boundary avoiding tendency of the log-normal seems to help the models in this situation. Finally, with the log-normal the variance and mean can be independent (depending on how the parameters are sampled) and this can actually lead to pooling independently of the estimated hierarchical mean.

I have since then moved to a log-normal on the individual parameters. Since the log-normal has a mode that is different from 0, I found that this distribution can handle these kinds of situations much better. However, I am still struggling to find good hyperpriors on the parameters of this log-normal, so I have a lot of problems with bad convergence of the model.

I think your better experience with the restricted normal might be due to a similar effect. Since this distribution is also unimodal, it can restrict the individual parameters from both sides. I have also thought about this idea before, but I have not yet tried it for any of my models, as I found the boundary avoiding property helpful and this distribution does not have this characteristic.

1 Like