Poisson and negative binomial regression


I’m trying to fit a regression model to some count data using rstanarm. X1 is a binary covariate, and X2 is a categorical covariate with 21 values. I’m trying both the poisson and negative binomial model, called using the following command,

stan_glm1 <- stan_glm(Count ~ X1 + X2,
                  data = d, family = poisson, offset=log(Offset),
                  prior = normal(0, 1), prior_intercept = normal(0, 1),
                  chains = 4, cores = 4, seed = 123456)
stan_glm2 <- stan_glm(Count ~ X1 + X2,
                  data = d, family = neg_binomial_2, offset=log(Offset),
                  prior = normal(0, 1), prior_intercept = normal(0, 1),
                  chains = 4, cores = 4, seed = 123456)

Both models (seemingly) converge: all R-hat < 1.01, no divergences, trace plots looking good etc. However, the results are widely different. The negative binomial model completely fails the posterior predictive checking, to such a degree that the prediction is several order of magnitude off. The Poisson posterior predictive check fits the data very well. I tried the same model in CmdStan (v.2.15) and it’s the same, so I don’t suspect it’s a problem inherent to rstanarm.

I’m not sure if there is a convenient way to provide the results from the model, but the mean_PPD is as follows:

Poisson mean_PPD 6487.6 17.8
nb mean_PPD 9230.1 4152.5

It could be that the poisson simply fits the model better, but shouldn’t the nb dispersion just go towards very high values, then, and approximate the poisson? Have I missed anything? Unfortunately I am not able to disclose the data.

When you have huge counts, you have to think a lot harder about the prior on the auxiliary variable of the negative binomial model, which is implemented via the prior_aux argument. Its default scale is way too small for data like this.

I would not say my counts are large, but the range might be big. The counts are between 0 and 22000, the offset is between 13600 and 210000.

Is a cauchy(0, 5) really too small?

I would at least try it with some other choices. If you do prior_aux = NULL, do you get results that are similar to the Poisson ones? Also, the RNG for the negative binomial is a bit flaky.

prior_aux = NULL gives the exact same result at the default prior, the cauchy (0, 5), a reciprocal dispersion with median=0.4 and MAD_SD=0.1. Increasing the scale of the cauchy to 10, 15 or 25 did not change results.

I did some more testing. Turns out my choice of priors for coefficients was way off. The MLE gives coefficients in the range of 30-50 for X1, and -1 to 1 for X2. If I increase the prior to something like normal(0, 100) or use QR=TRUE I get similar results.

Is there someway I can re-scale my count data so it better accommodates priors around 0? I tried dividing the offset by 100000, but that only scales the intercept, and not the coefficient for X1.

The QR scaling is good for predictors. It is generally hard to rescale the outcome and keep it a non-negative integer.

1 Like

Yep. I didn’t have much luck either when searching google scholar for rescaling of count data. I’m trying now to run the model in CmdStan with a pretty wide prior on the scale,

sigma_X1 ~ normal+(0, 5)
beta_X1 ~ normal(0, sigma_X1)

Hopefully the better choice of prior will give a better result.

Unfortunately you can’t scale integers the same way that you can scale real numbers. This is why count models with extremely large number of counts typically becomes ungainly. At the same time, when you have so many counts it becomes reasonable to approximate the counts with real numbers…