Useful PPC visual checks for counts of diffent magnitudes?

I’m fitting a model with rstanarm::stan_glmer() that has a negative binomial distribution, and I’m trying to figure out a useful way to display the PPC checks. The challenge is that the data are really highly skewed (lots of zeros and ones) with a really long tail, so (for example) the bayesplot::ppc_dens_overlay() looks like this:

Which is not super useful. I was checking out the PPCs for discrete data which is useful but I think solves a different problem.

Essentially my question is: for an example like this, what is a useful and meaningful way to actually plot the posterior predictive samples in a way that can be interpreted? My only two thoughts are 1) split the plot into a couple of different x-axes that have comparable y-axis values, or 2) log the values, but that feels super wrong for reasons I’m not sure of.

Any thoughts welcome!

here’s a working example if that’s useful:

data <- rnbinom(25000, size = 0.099, mu = 0.34)

df <- data.frame(
    y = data,
    fixed = sample(as.factor(c(1:25)), 25000, replace = TRUE),
    random1 = sample(as.factor(c(1:25)), 25000, replace = TRUE),
    random2 = sample(as.factor(c(1:55)), 25000, replace = TRUE)
)
model <- rstanarm::stan_glmer(
    y ~ fixed + (1 | random1) + (1 | random2),
    data = df,
    family = rstanarm::neg_binomial_2(link = "log")
)
bayesplot::ppc_dens_overlay(df$y, rstanarm::posterior_predict(model, draws = 500))

I’ve had success using the “pseudo_log” transformation built into ggplot, which uses a linear scale near zero and smoothly transitions to a log scale:

bayesplot::ppc_dens_overlay(df$y, rstanarm::posterior_predict(model, draws = 500)) + ggplot2::scale_x_continuous(trans = "pseudo_log")

It also works well with the rootogram PPC functions, if you prefer those for your checks.

2 Likes

Square root transformation of the scale is also commonly used for presenting counts which include 0. For discrete count data, rootograms are usually better than kernel density plots

3 Likes

amazing! yes this is great, I’ll try the psudeo-log as well as the square root that @avehtari suggested and see which work best. Thanks both!

1 Like