I am making a hierarchical negative binomial model and now I also want to compute the log likelihood to be able to compute the waic etc.

My model and generated quantities blocks now looks like the following:

model {
mu ~ normal(alpha, sigma_mu);
sigma_mu ~ normal(0, 1);
alpha ~ normal(log(4), 1);
beta ~ normal(-0.25, 1);
inv_phi ~ normal(0, 1);
y ~ neg_binomial_2_log(mu[ID] + beta * catalogue_price, phi);
}
generated quantities {
int y_rep[N];
int y_pred[N];
vector[N] log_lik;
for (n in 1:N) {
real eta_n = mu[ID[n]] + beta * catalogue_price[n];
y_rep[n] = neg_binomial_2_log_rng(eta_n, phi);
log_lik[n] = neg_binomial_2_lpmf(y_rep[n] | eta_n, phi);
}
}

Since, I am completely new to Stan I doubt whether the neg_binomial_2_lpmf() function is the correct one for the computation of the log likelihood. But is this correct?
Could someone maybe also explain what the differences are between the distribution functions such as neg_binomial_2_log, neg_binomial_2_lpmf and neg_binomial_2_cdf(). I am not completely sure whether I understand the usage correctly.

I don’t see a neg_binomial_2_log. I think you may be wanting to use an _rng function to generate y_rep[n], like this:

y_rep[n] = neg_binomial_2_rng(eta_n, phi);

neg_binomial_2_lpmf(ints y | reals mu, reals phi): The negative binomial probability mass of n given location mu and precision phi.

neg_binomial_2_cdf(ints n, reals mu, reals phi): The negative binomial cumulative distribution function of n given location mu and precision phi.

I’m wondering if the reference manual has a typo on the _lpmf; the argument is ints y but the description says n. Also, I think it should be “The negative binomial log probability mass” since _lpmf stands for log probability mass function, which is what you want for computing the log likelihood.

It should be, at least in the reference guide for Stan 2.17 it’s right under neg_binomial_2.

On top of all the confusion generated (to me it was at some point, at least) by all the different parameterizations of the negative binomial distribution, there could be another layer of confusion between those two distributions mentioned. Here’s what it should be, and I hope I’m getting this right:

y ~ neg_binomial_2(mu, phi) is the log of the ‘negative binomial likelihood’ of observation vector \mathbf{y} according to the parameterization in the reference guide, and it’s equivalent to a vectorized form of neg_binomial_2_lpmf(ints y | reals mu, reals phi), like the loop over y_rep[n]

y ~ neg_binomial_2_log(eta, phi) is an alternative parameterization where \eta = log(\mu), and is equivalent to looping over neg_binomial_2_log_lpmf(ints y | reals eta, reals phi).

Apparently the latter can be useful. if you are using a log link function like in a GLM or alike, so you can just use that function directly and it can be faster, but it’s otherwise equivalent to exponentiating the predicted function and using the former.

So @c5v, if you are using neg_binomial_2_log and neg_binomial_2_lpmf together as equivalents it would appear that you are mixing two different things, and it should be solve by using either pair of functions above (_cdf functions shoudl follow the same rule, although I’m not they are implemented for the latter).
Without further information or details of your model it’s not completely clear to me what you are trying to do with y_rep[n] = neg_binomial_2_log(eta_n, phi), if you are trying to get a random number than the suggestion above by @ScottAlder of using neg_binomial_2_rng should be what you are looking for.

I just checked, and it looks like it’s neg_binomial_2_log in the 2.17.0 stan-reference along with two other suffixes (_lpmf_rng). But in the 2.18.0 functions-reference that I linked to, there is no _log suffix, just neg_binomial_2 along with _lpmf_cdf_lcdf_lccdf and _rng. That’s a lot to keep straight!

You’re right! in the 2.18.0 functions reference the _log suffix for the alternative parameterization is down in its own section. Still, there are 9 functions associated with neg_binomial_2 and 6 associated with neg_binomial

@ScottAlder
I updated my question by adding my model block and just saw that I made a mistake in my generated quantities block. Here, for y_rep, which are just the estimated y values to compute in-sample performence, I use neg_binomial_2_log_rng() instead of neg_binomial_2_log. So if I understand it correctly I should use neg_binomial_2_log_lpmf to compute the values of the log likelihood?

However, when I use this function I get the error: Exception: neg_binomial_2_log_lpmf: Failures variable[2] is -2147483648, but must be >= 0! and I don’t know why this is going wrong