Two different answers from ppc_loo_pit_qq

peter2108 · January 6, 2025, 6:30pm

When using rstanarm with the roaches dataset I get two different charts of the goodness of fit of a model. The model - which is taken from Aki Vehtari’s notebook on cross-validation model for model selection is:

stan_glmnb <- stan_glm(y ~ sqrt_roach1 + treatment + senior, offset = log(exposure2),
                      data = roaches, family = neg_binomial_2, 
                      prior = normal(0,2.5), prior_intercept = normal(0,5),
                      chains = 4, cores = 1, seed = 1704009, refresh=0)
# create loo object
loo_glmnb <- loo(stan_glmnb, save_psis = TRUE)

Now following exactly the notebook we have:

loo_pit <- function(y, yrep, lw) {
  exp_log_sum_exp <- function(x) {
    m <- suppressWarnings(max(x))
    exp(m + log(sum(exp(x - m))))
  }
  pit <- vapply(seq_len(ncol(yrep)), function(j) {
    sel_min <- yrep[, j] < y[j]
    pit_min <- exp_log_sum_exp(lw[sel_min,j])
    sel_sup <- yrep[, j] == y[j]
    pit_sup <- pit_min + exp_log_sum_exp(lw[sel_sup,j])
    runif(1, pit_min, pit_sup)
  }, FUN.VALUE = 1)
  pmax(pmin(pit, 1), 0)
}

ppc_loo_pit_qq(pit=loo_pit(y = roaches$y, yrep = posterior_predict(stan_glmnb),
                           lw=weights(loo_glmnb$psis_object))) + 
  geom_abline() +
  ylim(c(0,1))

This results in the following figure that shows a good fit:

The documentation of `ppc_loo_pit_qq states:

For ppc_loo_pit_overlay() and ppc_loo_pit_qq(), optionally a vector of precomputed PIT values that can be specified instead of y, yrep, and lw (these are all ignored if pit is specified). If not specified the PIT values are computed internally before plotting

So I thought that I could avoid the included function by specifying the alternative arguments to the Bayesplot function and let it do the work (to hide away what you do not understand)

ppc_loo_pit_qq(
  y = roaches$y,
  yrep = posterior_predict(stan_glmnb),
  lw =  weights(loo_glmnb$psis_object),
  pit = NULL,
  compare = "uniform"
) + 
  geom_abline() + 
  ylim(c(0,1))

But this plots as:

So presumably the loo_pit function prepares the pit vector differently to what the unassisted Bayesplot function does. I expect I have missed something here but I have no idea what.

jonah · January 7, 2025, 1:12am

I haven’t had a chance to look into this thoroughly but I have a guess:

I think Aki’s notebook is using an updated version of the PIT calculation that’s better specifically for discrete data like yours.

bayesplot uses the loo_pit() function from our rstantools package, which has been updated on GitHub by @TeemuSailynoja to use the improved version, but it hasn’t been released on CRAN yet (that’s mostly my fault – I’ve been working on other packages and projects and I honestly forgot we had unreleased updates for rstantools). For now, if you want to install it from GitHub you can use:

# install.packages("remotes")
remotes::install_github("stan-dev/rstantools")

You’ll probably then need to restart your R session in order for bayesplot to use that version of rstantools that you just installed from GitHub.

If you try that let me know if that was indeed the issue.

peter2108 · January 7, 2025, 10:29am

Yes, that fixes it. Thanks.

Topic		Replies	Views
How to decide on the best model? Modeling loo	3	601	December 3, 2020
LOO Model Comparison Alternative Modeling rstan , techniques , loo , cmdstanr	3	92	March 27, 2025
I used stan_jm with different option for assoc like etavalue, etaauc etc. How can I compare these models, any fitness information I can get from these model? rstanarm	5	853	January 22, 2021
Model selection of nonlinear flexible hierarchical model with loo Modeling loo , hierarchical-model , model-comparison	1	799	June 14, 2021
Interpret pareto k diagnostic Modeling rstan , fitting-issues , loo	3	1632	August 3, 2023

Two different answers from ppc_loo_pit_qq

Related topics