Poisson and negative binomial regression

David_Westergaard · July 8, 2017, 4:01pm

Hi,

I’m trying to fit a regression model to some count data using rstanarm. X1 is a binary covariate, and X2 is a categorical covariate with 21 values. I’m trying both the poisson and negative binomial model, called using the following command,

stan_glm1 <- stan_glm(Count ~ X1 + X2,
                  data = d, family = poisson, offset=log(Offset),
                  prior = normal(0, 1), prior_intercept = normal(0, 1),
                  chains = 4, cores = 4, seed = 123456)
stan_glm2 <- stan_glm(Count ~ X1 + X2,
                  data = d, family = neg_binomial_2, offset=log(Offset),
                  prior = normal(0, 1), prior_intercept = normal(0, 1),
                  chains = 4, cores = 4, seed = 123456)

Both models (seemingly) converge: all R-hat < 1.01, no divergences, trace plots looking good etc. However, the results are widely different. The negative binomial model completely fails the posterior predictive checking, to such a degree that the prediction is several order of magnitude off. The Poisson posterior predictive check fits the data very well. I tried the same model in CmdStan (v.2.15) and it’s the same, so I don’t suspect it’s a problem inherent to rstanarm.

I’m not sure if there is a convenient way to provide the results from the model, but the mean_PPD is as follows:

Poisson mean_PPD 6487.6 17.8
nb mean_PPD 9230.1 4152.5

It could be that the poisson simply fits the model better, but shouldn’t the nb dispersion just go towards very high values, then, and approximate the poisson? Have I missed anything? Unfortunately I am not able to disclose the data.

bgoodri · July 8, 2017, 4:58pm

When you have huge counts, you have to think a lot harder about the prior on the auxiliary variable of the negative binomial model, which is implemented via the prior_aux argument. Its default scale is way too small for data like this.

David_Westergaard · July 8, 2017, 5:00pm

I would not say my counts are large, but the range might be big. The counts are between 0 and 22000, the offset is between 13600 and 210000.

Is a cauchy(0, 5) really too small?

bgoodri · July 8, 2017, 5:40pm

I would at least try it with some other choices. If you do prior_aux = NULL, do you get results that are similar to the Poisson ones? Also, the RNG for the negative binomial is a bit flaky.

David_Westergaard · July 8, 2017, 5:44pm

prior_aux = NULL gives the exact same result at the default prior, the cauchy (0, 5), a reciprocal dispersion with median=0.4 and MAD_SD=0.1. Increasing the scale of the cauchy to 10, 15 or 25 did not change results.

David_Westergaard · July 9, 2017, 7:30am

I did some more testing. Turns out my choice of priors for coefficients was way off. The MLE gives coefficients in the range of 30-50 for X1, and -1 to 1 for X2. If I increase the prior to something like normal(0, 100) or use QR=TRUE I get similar results.

Is there someway I can re-scale my count data so it better accommodates priors around 0? I tried dividing the offset by 100000, but that only scales the intercept, and not the coefficient for X1.

bgoodri · July 9, 2017, 10:27am

The QR scaling is good for predictors. It is generally hard to rescale the outcome and keep it a non-negative integer.

David_Westergaard · July 9, 2017, 10:58am

Yep. I didn’t have much luck either when searching google scholar for rescaling of count data. I’m trying now to run the model in CmdStan with a pretty wide prior on the scale,

sigma_X1 ~ normal+(0, 5)
beta_X1 ~ normal(0, sigma_X1)

Hopefully the better choice of prior will give a better result.

betanalpha · July 9, 2017, 1:53pm

Unfortunately you can’t scale integers the same way that you can scale real numbers. This is why count models with extremely large number of counts typically becomes ungainly. At the same time, when you have so many counts it becomes reasonable to approximate the counts with real numbers…

Topic		Replies	Views
Posterior predictive checking Modeling	10	2042	November 26, 2019
Issue using posterior_vs_prior with Poisson or Neg binomial models rstanarm	11	746	September 14, 2019
Help - Conducting Count Data Model - Negative Binomial Model Modeling specification , economics , meta-analysis , count-data	1	310	September 26, 2023
Negative Binomial Regression worse mean prediction performance than log_poisson Modeling	1	499	January 14, 2021
Fitting count data with negative binomial - long tail Modeling techniques , fitting-issues , specification , brms , count-data	7	398	July 16, 2024

Poisson and negative binomial regression

Related topics