This is a follow-up question to a discussion I started on this thread. I’m trying to figure out what prior I should use for the dispersion parameter of a negative binomial distribution (phi).

I did a bit of googling and reading and found a short discussion of this on pages 141-142 of the manual. Based on that, and this paper, I changed my model to:

data {
int<lower=0> I; // number of practices
int<lower=0> T; // number of time periods
int<lower=1> y[I*T]; // Count outcome
vector[I * T] n; // number of patients in period t
int<lower=1,upper=I> practice[I*T]; //
vector<lower=-1,upper=4>[I*T] time; //
int<lower=1,upper=5> index_time[I*T]; //
}
parameters {
real<lower=0> phi; // neg. binomial dispersion parameter
vector[I] a_std; //
real<lower=0> sigma_a; //
vector[I] b_std; //
real<lower=0> sigma_b; //
vector[T] gamma; // year specific intercept
}
transformed parameters {
vector[I] a;
vector[I] b;
a = (sigma_a * a_std)/100 ; // Matt trick
b = (sigma_b * b_std)/100 ; // Matt trick
}
model {
phi ~ gamma(2, 0.1);
sigma_a ~ normal(0, 1);
sigma_b ~ normal(0, 1);
a_std ~ normal(0,1);
b_std ~ normal(0,1);
gamma ~normal(0,1);
y ~ neg_binomial_2_log(log(n) + gamma[index_time] + a[practice] + b[practice] .* time, phi); // likelihood
}
generated quantities {
vector[I * T] y_sim;
vector[I * T] log_lik;
vector[I * T] eta = log(n) + gamma[index_time] + a[practice] + b[practice] .* time;
for (it in 1:I * T) {
y_sim[it] = neg_binomial_2_log_rng(eta[it], phi);
}
for(it in 1:I*T){
log_lik[it] = neg_binomial_2_log_lpmf(y[it] | eta[it], phi);
}
}

My question is, how do I decide the parameter values for gamma if I want to use weakly informative priors. Is gamma(2,0.1) reasonable or should I use gamma(2,0.5) or maybe something completely different?

We typically refer to weakly informative priors in terms of containment. We chose a scale, s, and then define a prior that places most of its probability mass between -s and s if the parameter is unconstrained or 0 and s if it’s constrained to be positive.

Arguably boundary-aversion is additional information that pushes the prior into more fully informative territory, but nobody has really defined these words so don’t let the vocabulary be a distraction. The point is that a boundary-avoiding prior needs two scales – a lower scale that indicates how far away from zero is reasonable and an upper scale that indicates how close to infinity is reasonable.

Or, as @betanalpha suggested, use a prior that has a light left tail and a heavy right tail. An inverse-gamma or a NIG is often a good substitute (even if you do have to work harder to find well-calibrated parameters)

Thanks @betanalpha. Is there any consensus about what two scales are reasonable to use for the dispersion parameter of a negative binomial? This is the first time I try to fit this type of model with Stan so I honestly have no clue what are reasonable values.

I noticed that you use a check_all_diagnostics function in your code. Where can I get that function?

Thanks @anon75146577. Any chance that you can share some stan code that shows how to implement this?

If I use phi ~ inv_gamma(alpha, beta), what values would you recommend for alpha and beta?