Useful PPC visual checks for counts of diffent magnitudes?

I’m fitting a model with rstanarm::stan_glmer() that has a negative binomial distribution, and I’m trying to figure out a useful way to display the PPC checks. The challenge is that the data are really highly skewed (lots of zeros and ones) with a really long tail, so (for example) the bayesplot::ppc_dens_overlay() looks like this:

Which is not super useful. I was checking out the PPCs for discrete data which is useful but I think solves a different problem.

Essentially my question is: for an example like this, what is a useful and meaningful way to actually plot the posterior predictive samples in a way that can be interpreted? My only two thoughts are 1) split the plot into a couple of different x-axes that have comparable y-axis values, or 2) log the values, but that feels super wrong for reasons I’m not sure of.

Any thoughts welcome!

here’s a working example if that’s useful:

data <- rnbinom(25000, size = 0.099, mu = 0.34)

df <- data.frame(
    y = data,
    fixed = sample(as.factor(c(1:25)), 25000, replace = TRUE),
    random1 = sample(as.factor(c(1:25)), 25000, replace = TRUE),
    random2 = sample(as.factor(c(1:55)), 25000, replace = TRUE)
model <- rstanarm::stan_glmer(
    y ~ fixed + (1 | random1) + (1 | random2),
    data = df,
    family = rstanarm::neg_binomial_2(link = "log")
bayesplot::ppc_dens_overlay(df$y, rstanarm::posterior_predict(model, draws = 500))

I’ve had success using the “pseudo_log” transformation built into ggplot, which uses a linear scale near zero and smoothly transitions to a log scale:

bayesplot::ppc_dens_overlay(df$y, rstanarm::posterior_predict(model, draws = 500)) + ggplot2::scale_x_continuous(trans = "pseudo_log")

It also works well with the rootogram PPC functions, if you prefer those for your checks.


Square root transformation of the scale is also commonly used for presenting counts which include 0. For discrete count data, rootograms are usually better than kernel density plots


amazing! yes this is great, I’ll try the psudeo-log as well as the square root that @avehtari suggested and see which work best. Thanks both!

