Melanoma example

Hello!

With the following data from The data (K.CSV (256 Bytes)) is from Koch, G.G.; Atkinson, S.S.; Stokes, M.E. 1986. Encyclopedia of Statistical Sciences. Volume 7. John Wiley. New York. Edited by Samuel Kotz and Norman Johnson.

Melanoma,Area,AgeGroup,Population
61,0,<35,2880262
76,0,35-44,564535
98,0,45-54,592983
104,0,54-64,450740
63,0,65-74,270908
80,0,>74,161850
64,1,<35,1074246
75,1,35-44,220407
68,1,45-54,198119
63,1,54-64,134084
45,1,65-74,70708
27,1,>74,34233

I can fit the following poisson model:

fit_1a <- rstanarm::stan_glm(Melanoma ~ Area + AgeGroup, offset=log(Population), 
                             family=poisson(link = "log"),
                             data=data)

Although the resulting coefficients are the same as in Koch (1986), the posterior predictive checks don’t look good. I was wondering whether it is possible to improve its fit — improving the results of PSIS and PPC — by transforming the data or creating new variable from the same data. Or these variables are not enough and more variables might be needed.

bayesplot::pp_check(fit_1a) + xlim(0,300)
loo1 <- loo::loo(fit_1a, save_psis = TRUE)
plot(loo1)

Rplot1
Rplot2
Also, what do the experts prefer a good looking PSIS test or good looking yrep? I think both of them should need to be good.
When using a negative binomial model instead of a poisson, the PSIS test improves and the yrep worsens.

PPC plot looks fine. With only 12 observations, all observations are influential which is shown in Pareto k values. So the diagnostics you show don’t indicate anything badly wrong, but also with only 12 observations there’s not much hope that the data would have useful information for more elaborate model.

2 Likes