When using rstanarm
with the roaches
dataset I get two different charts of the goodness of fit of a model. The model - which is taken from Aki Vehtari’s notebook on cross-validation model for model selection is:
stan_glmnb <- stan_glm(y ~ sqrt_roach1 + treatment + senior, offset = log(exposure2),
data = roaches, family = neg_binomial_2,
prior = normal(0,2.5), prior_intercept = normal(0,5),
chains = 4, cores = 1, seed = 1704009, refresh=0)
# create loo object
loo_glmnb <- loo(stan_glmnb, save_psis = TRUE)
Now following exactly the notebook we have:
loo_pit <- function(y, yrep, lw) {
exp_log_sum_exp <- function(x) {
m <- suppressWarnings(max(x))
exp(m + log(sum(exp(x - m))))
}
pit <- vapply(seq_len(ncol(yrep)), function(j) {
sel_min <- yrep[, j] < y[j]
pit_min <- exp_log_sum_exp(lw[sel_min,j])
sel_sup <- yrep[, j] == y[j]
pit_sup <- pit_min + exp_log_sum_exp(lw[sel_sup,j])
runif(1, pit_min, pit_sup)
}, FUN.VALUE = 1)
pmax(pmin(pit, 1), 0)
}
ppc_loo_pit_qq(pit=loo_pit(y = roaches$y, yrep = posterior_predict(stan_glmnb),
lw=weights(loo_glmnb$psis_object))) +
geom_abline() +
ylim(c(0,1))
This results in the following figure that shows a good fit:
The documentation of `ppc_loo_pit_qq states:
For ppc_loo_pit_overlay() and ppc_loo_pit_qq(), optionally a vector of precomputed PIT values that can be specified instead of y, yrep, and lw (these are all ignored if pit is specified). If not specified the PIT values are computed internally before plotting
So I thought that I could avoid the included function by specifying the alternative arguments to the Bayesplot function and let it do the work (to hide away what you do not understand)
ppc_loo_pit_qq(
y = roaches$y,
yrep = posterior_predict(stan_glmnb),
lw = weights(loo_glmnb$psis_object),
pit = NULL,
compare = "uniform"
) +
geom_abline() +
ylim(c(0,1))
But this plots as:
So presumably the loo_pit
function prepares the pit vector differently to what the unassisted Bayesplot function does. I expect I have missed something here but I have no idea what.