Hello!
With the following data from The data (K.CSV (256 Bytes)) is from Koch, G.G.; Atkinson, S.S.; Stokes, M.E. 1986. Encyclopedia of Statistical Sciences. Volume 7. John Wiley. New York. Edited by Samuel Kotz and Norman Johnson.
Melanoma,Area,AgeGroup,Population
61,0,<35,2880262
76,0,35-44,564535
98,0,45-54,592983
104,0,54-64,450740
63,0,65-74,270908
80,0,>74,161850
64,1,<35,1074246
75,1,35-44,220407
68,1,45-54,198119
63,1,54-64,134084
45,1,65-74,70708
27,1,>74,34233
I can fit the following poisson model:
fit_1a <- rstanarm::stan_glm(Melanoma ~ Area + AgeGroup, offset=log(Population),
family=poisson(link = "log"),
data=data)
Although the resulting coefficients are the same as in Koch (1986), the posterior predictive checks don’t look good. I was wondering whether it is possible to improve its fit — improving the results of PSIS and PPC — by transforming the data or creating new variable from the same data. Or these variables are not enough and more variables might be needed.
bayesplot::pp_check(fit_1a) + xlim(0,300)
loo1 <- loo::loo(fit_1a, save_psis = TRUE)
plot(loo1)
Also, what do the experts prefer a good looking PSIS test or good looking yrep? I think both of them should need to be good.
When using a negative binomial model instead of a poisson, the PSIS test improves and the yrep worsens.