Boundary-Avoiding Priors

This is a follow-up question to a discussion I started on this thread. I’m trying to figure out what prior I should use for the dispersion parameter of a negative binomial distribution (phi).

I did a bit of googling and reading and found a short discussion of this on pages 141-142 of the manual. Based on that, and this paper, I changed my model to:

data {
  int<lower=0> I;                   // number of practices 
  int<lower=0> T;                   // number of time periods 
  int<lower=1> y[I*T];              // Count outcome
  vector[I * T] n;                  // number of patients in period t
  int<lower=1,upper=I> practice[I*T];          //
  vector<lower=-1,upper=4>[I*T] time;          //
  int<lower=1,upper=5> index_time[I*T];        //
}


parameters {
  real<lower=0> phi;            // neg. binomial dispersion parameter
  vector[I] a_std;              //
  real<lower=0> sigma_a;        // 
  vector[I] b_std;              //
  real<lower=0> sigma_b;        // 
  vector[T] gamma;              // year specific intercept   
}


transformed parameters {
  vector[I] a; 
  vector[I] b; 
  a = (sigma_a * a_std)/100 ;            // Matt trick
  b = (sigma_b * b_std)/100 ;            // Matt trick
}

model {
  phi ~ gamma(2, 0.1);
  sigma_a ~ normal(0, 1);
  sigma_b ~ normal(0, 1);
  a_std ~ normal(0,1);
  b_std ~ normal(0,1);
  gamma ~normal(0,1);
  y ~ neg_binomial_2_log(log(n) + gamma[index_time] + a[practice] + b[practice] .* time, phi);   // likelihood
}


generated quantities {
  vector[I * T] y_sim;
  vector[I * T] log_lik;
  vector[I * T] eta = log(n) + gamma[index_time] + a[practice] + b[practice] .* time;
 for (it in 1:I * T) {
  y_sim[it] = neg_binomial_2_log_rng(eta[it], phi);
 }
 for(it in 1:I*T){
    log_lik[it] = neg_binomial_2_log_lpmf(y[it] | eta[it], phi);
 }
}

My question is, how do I decide the parameter values for gamma if I want to use weakly informative priors. Is gamma(2,0.1) reasonable or should I use gamma(2,0.5) or maybe something completely different?

We typically refer to weakly informative priors in terms of containment. We chose a scale, s, and then define a prior that places most of its probability mass between -s and s if the parameter is unconstrained or 0 and s if it’s constrained to be positive.

Arguably boundary-aversion is additional information that pushes the prior into more fully informative territory, but nobody has really defined these words so don’t let the vocabulary be a distraction. The point is that a boundary-avoiding prior needs two scales – a lower scale that indicates how far away from zero is reasonable and an upper scale that indicates how close to infinity is reasonable.

Given those two scales you can then construct an appropriate prior that places most of its probability mass in between. For an example see Section 4 of https://betanalpha.github.io/assets/case_studies/gp_part3/part3.html.

My recommendation would be the PC prior on the over-dispersion parameter. It can be found here. rejoinder.pdf (236.0 KB)

R-code for the log-density:

INLA::inla.pc.dgamma function (x, lambda = 1, log = FALSE) 
{
    inv.x = 1/x
    d = sqrt(2 * (log(inv.x) - psigamma(inv.x)))
    ld = -lambda * d - log(d) + log(lambda) - 2 * log(x) + log(psigamma(inv.x, 
        deriv = 1) - x)
    return(if (log) ld else exp(ld))
}

This is the default in INLA and it’s worked quite well in applications.

2 Likes

Or, as @betanalpha suggested, use a prior that has a light left tail and a heavy right tail. An inverse-gamma or a NIG is often a good substitute (even if you do have to work harder to find well-calibrated parameters)

Thanks @betanalpha. Is there any consensus about what two scales are reasonable to use for the dispersion parameter of a negative binomial? This is the first time I try to fit this type of model with Stan so I honestly have no clue what are reasonable values.

I noticed that you use a check_all_diagnostics function in your code. Where can I get that function?

Thanks @anon75146577. Any chance that you can share some stan code that shows how to implement this?

If I use phi ~ inv_gamma(alpha, beta), what values would you recommend for alpha and beta?