I’m trying to fit a regression model to some count data using rstanarm. X1 is a binary covariate, and X2 is a categorical covariate with 21 values. I’m trying both the poisson and negative binomial model, called using the following command,
stan_glm1 <- stan_glm(Count ~ X1 + X2, data = d, family = poisson, offset=log(Offset), prior = normal(0, 1), prior_intercept = normal(0, 1), chains = 4, cores = 4, seed = 123456) stan_glm2 <- stan_glm(Count ~ X1 + X2, data = d, family = neg_binomial_2, offset=log(Offset), prior = normal(0, 1), prior_intercept = normal(0, 1), chains = 4, cores = 4, seed = 123456)
Both models (seemingly) converge: all R-hat < 1.01, no divergences, trace plots looking good etc. However, the results are widely different. The negative binomial model completely fails the posterior predictive checking, to such a degree that the prediction is several order of magnitude off. The Poisson posterior predictive check fits the data very well. I tried the same model in CmdStan (v.2.15) and it’s the same, so I don’t suspect it’s a problem inherent to rstanarm.
I’m not sure if there is a convenient way to provide the results from the model, but the mean_PPD is as follows:
Poisson mean_PPD 6487.6 17.8
nb mean_PPD 9230.1 4152.5
It could be that the poisson simply fits the model better, but shouldn’t the nb dispersion just go towards very high values, then, and approximate the poisson? Have I missed anything? Unfortunately I am not able to disclose the data.