Types of hyperpriors in Stan

Hi all,

This may not be a Stan-specific question, (and maybe very basic) but I hope the community is still able to help.

I want to estimate two linear regression parameters (intercept, \alpha and slope, \beta) by partially pooling information from exchangeable groups. I understand, conceptually at least, the process of using a vector to collect separate group estimates and specifying a shared prior model to link them.

I have only ever seen examples of this global prior being normally distributed, with two hyperparameters (mean, \mu and standard deviation, \sigma). Here \sigma describes the variability between groups, with very low values approaching fully pooled models and very high values approaching a set of independent models.

I am interested in the case where \beta should be constrained to be positive.
I, therefore, wanted to use an exponential (global) prior. Now that the mean and variance would no longer independent, I wondered how the single distribution parameter controls the extent of pooling?

In the below extracts:

  • I first used a normal global prior (with <lower = 0> in the parameters block), which appears to work as expected.
  • I then tried to use an exponential global prior for (for which I have tried many different models for rate_beta and I do not see any regression to the population mean, or reduced uncertainty in the marginal posterior.

I wanted to check if there was a fundamental issue with non-Gaussian hierarchical priors?


Standard example, with a normal global prior (works well!)

model {
  z ~ normal(alpha[group] + beta[group].* x, sigma)
  
  // Hierarchical Priors
  alpha ~ normal(mu_alpha, sigma_alpha); 
  beta ~ normal(mu_beta, sigma_beta);

  // Hyperpriors
  mu_alpha ~ normal(0, 1);
  sigma_alpha ~ exponential(1);
  mu_beta ~ normal(0, 1);
  sigma_beta ~ exponential(1);
  
// Remaining Priors
  sigma ~ exponential(1);  
...
} 

Exponential prior for Beta - doesn’t appear to be pooling

model {
  z ~ normal(alpha[group] + beta[group].* x, sigma)
  
  // Hierarchical Priors
  alpha ~ normal(mu_alpha, sigma_alpha); 
  beta ~ exponential(rate_beta);

  // Hyperpriors
  mu_alpha ~ normal(0, 1);
  sigma_alpha ~ exponential(1);
  rate_beta ~ exponential(1);
  
// Remaining Priors
  sigma ~ exponential(1);  
...
} 

Thanks all!

3 Likes

If I’m reading this right, you’ve defined independent exponential priors on rate_beta, so there isn’t any pooling. Would something like the following work?

rate_beta ~ exp(rate);
rate ~ normal(mu_rate, sigma_rate);
mu_rate ~ normal(0, 1);
sigma_rate ~ exponential(1)

Hi @kholsinger,

Thank you for your reply - much appreciated!

Apologies for taking so long to respond, but I had become busy with other tasks and then had some difficulties with software updates.

I tried running the same model with the below adjustments (adding the extra level that you recommended to the beta parameters) and it produced results very similar to the simple independent models - no pooling.

...
model {
 
  z ~ normal(alpha[group] + beta[beta] .* x, sigma);

  // Hierarchical Priors:
  alpha ~ normal(mu_alpha, sigma_alpha);
  beta ~ exponential(rate_beta);
  
  // Hyperpriors:
  mu_alpha ~ normal(0, 1);
  sigma_alpha ~ exponential(1);
  
  rate_beta ~ normal(mu_rate_beta, sigma_rate_beta);
  mu_rate_beta ~ normal(0, 1);
  sigma_rate_beta ~ exponential(1);

  // Remaining Priors  
  sigma ~ exponential(1);
} 
...

As I previously mentioned, I have a working model with Gaussian hyperpriors for beta, where I have just constrained necessary values to be positive rather than using an exponential distribution.

I’m wondering if we need Gaussian hyperpriors for pooling between exchangeable groups?

It seems you have similar problems to mine. I am also working regularly with hierarchical models that need constrained parameters and have struggled to find good recommendations on this topics or methods that work well (see also my question on this issue). Maybe we could share some experiences on this topic here.

Like you, I have also tried an exponential hierarchical prior on the individual parameters as my first approach, but I have quickly moved away from this approach. As far as I could see from my experiments, the exponential prior has the drawback, that it only limits the individual parameters from one side (i.e. it makes higher values unlikely) but not from the other. So group level information is only propagated if it indicates that there is some kind of upper bound on the actual individual values. In addition, if rate_beta becomes very high, indicating some kind of pooling, then this simultaneously can increase the prio likelihood for very low values, which lead to distortions in some of my models. For example, I had models where the individual parameters seem to be in the range (95% HDI) of [1,2], then this seems to be hard to capture using an exponential distribution. This can be understood as the HDI for the exponential must always be of the kind [0,X]. Also, I found that the boundary avoiding tendency of the log-normal seems to help the models in this situation. Finally, with the log-normal the variance and mean can be independent (depending on how the parameters are sampled) and this can actually lead to pooling independently of the estimated hierarchical mean.

I have since then moved to a log-normal on the individual parameters. Since the log-normal has a mode that is different from 0, I found that this distribution can handle these kinds of situations much better. However, I am still struggling to find good hyperpriors on the parameters of this log-normal, so I have a lot of problems with bad convergence of the model.

I think your better experience with the restricted normal might be due to a similar effect. Since this distribution is also unimodal, it can restrict the individual parameters from both sides. I have also thought about this idea before, but I have not yet tried it for any of my models, as I found the boundary avoiding property helpful and this distribution does not have this characteristic.

1 Like