Raw data rendered differently by pp_check() in brms

Same data were fitted with three slightly different models: negative binomial, zero-inflated negative binomial and hurdle negative binomial. When posterior predictive check was performed using pp_check() in brms, the raw data (black curve) was shown differently with the values close to 0 (see attached three plots from pp_check). Any way to resolve this?

test0NB.pdf (15.0 KB) testNB-hurdle.pdf (16.2 KB) testNB.pdf (13.8 KB)

I guess @Jonah would know, but it looks like the x-axis is the same in all three plots?

Yes, it is the same data and the x-axis is the same across all the 3 plots.

Hmm, yeah I see what you mean. But the y-axis could really be changing considerably between the plots. Can you turn on the y-axis labeling? If you have bayesplot loaded you can do pp_check(…) + yaxis_text() and I think it should turn on the y-axis labels. If the y-axis turns out to be changing a lot between the plots then I wouldn’t be too surprised that the data looks different even if it isn’t. The shape is roughly the same near 0 it’s just that the slope is steeper in some of the plots than others, so that could be due to the y-axis differences (unless I’m misunderstanding, which is totally possible!).

If the y-axis differences don’t explain this then we can look into it further if you can share a reproducible example. If there’s a bug here it’s probably happening in brms before passing the data to bayesplot, but my hunch is that brms is probably doing the right thing here (we’ll see if I end up being right!).

Thanks, Jonah!

This time I forced the $y-axis to be within the same range for the three plots using

pp_check(fm, nsamples = 200) +
coord_cartesian(lim = c(0, 1000), ylim=c(0, 0.007))

The attached figures look even more different around the 0 area. I have seen subtle differences under a different scenario before, but this time it seems more dramatic. Let me know if you want a testing dataset.

test0NB.pdf (14.8 KB) testNB-hurdle.pdf (15.8 KB) testNB.pdf (13.0 KB)

Thanks for sharing the new plots. Yeah they definitely still look different. The next thing to figure out is whether the issue is in brms or in the bayesplot package that brms is calling to make the plots. Can you try making these plots using bayesplot directly instead of pp_check()? It’s pretty simple:

# assuming y is your outcome variable and fit is your brmsfit object
bayesplot::ppc_dens_overlay(y = y, yrep = posterior_predict(fit, nsamples = 200))

That should produce the same plot as the pp_check default. If the plots still have the same problem then something weird could be happening in the bayesplot function that I need to figure out. Otherwise it’s probably in brms and hopefully Paul can figure it out.

Yes, the same problem persisted with bayesplot. Let me know how I could assist the diagnosis process.

I think the best chance of figuring this out is if I can reproduce it on my computer. Since it happens just with bayesplot I guess I don’t need to fit the models, I just need the y and yrep used in ppc_dens_overlay. Is it possible for you to share those objects?

The objects are here

yrep1, yrep2 and yrep3 correspond to the three plots I showed before. Use the following code to reproduce the plots:

dev.new()
bayesplot::ppc_dens_overlay(y = y, yrep = yrep1) + coord_cartesian(xlim = c(0, 1000), ylim=c(0, 0.007))

dev.new()
bayesplot::ppc_dens_overlay(y = y, yrep = yrep2) + coord_cartesian(xlim = c(0, 1000), ylim=c(0, 0.007))

dev.new()
bayesplot::ppc_dens_overlay(y = y, yrep = yrep3) + coord_cartesian(xlim = c(0, 1000), ylim=c(0, 0.007))

Thanks, Jonah!

Thanks for sharing the objects, that’s really helpful. I think I’ve kind of figured out what’s going on and it doesn’t actually have anything to do with bayesplot or brms but rather with the difference between using ggplot2’s coord_cartesian() vs ggplot2’s xlim() and ylim() functions.

Can you try this?

bayesplot::ppc_dens_overlay(y = y, yrep = yrep1) + 
  xlim(0, 1000) + 
  ylim(0, 0.01)

bayesplot::ppc_dens_overlay(y = y, yrep = yrep2) + 
  xlim(0, 1000) + 
  ylim(0, 0.01)

So when I use xlim() and ylim() instead of using coord_cartesian() the problem you were seeing seems to go away (I would have actually thought it would be the opposite, but ggplot2 is confusing!). Do you see the same thing when you try it?

2 Likes

Yes, that worked out nicely! I have no idea why ggplot2 behaves like this.

Thanks a lot for resolving this, Jonah!

1 Like

Hi @Gang,

this has actually bitten me a few times before. Using coord_cartesian() zooms the plot and doesn’t change the data. Using xlim() and ylim(), however, comes with a warning:

Be warned that this will remove data outside the limits and this can produce unintended results.

Good that Jonah found what the problem was :)

2 Likes

Thanks for sharing the information! Have a nice day.