Raw data rendered differently by pp_check() in brms

Gang · November 23, 2020, 12:54pm

Same data were fitted with three slightly different models: negative binomial, zero-inflated negative binomial and hurdle negative binomial. When posterior predictive check was performed using pp_check() in brms, the raw data (black curve) was shown differently with the values close to 0 (see attached three plots from pp_check). Any way to resolve this?

test0NB.pdf (15.0 KB) testNB-hurdle.pdf (16.2 KB) testNB.pdf (13.8 KB)

torkar · November 23, 2020, 5:30pm

I guess @Jonah would know, but it looks like the x-axis is the same in all three plots?

Gang · November 23, 2020, 6:02pm

Yes, it is the same data and the x-axis is the same across all the 3 plots.

jonah · November 24, 2020, 12:46am

Hmm, yeah I see what you mean. But the y-axis could really be changing considerably between the plots. Can you turn on the y-axis labeling? If you have bayesplot loaded you can do pp_check(…) + yaxis_text() and I think it should turn on the y-axis labels. If the y-axis turns out to be changing a lot between the plots then I wouldn’t be too surprised that the data looks different even if it isn’t. The shape is roughly the same near 0 it’s just that the slope is steeper in some of the plots than others, so that could be due to the y-axis differences (unless I’m misunderstanding, which is totally possible!).

If the y-axis differences don’t explain this then we can look into it further if you can share a reproducible example. If there’s a bug here it’s probably happening in brms before passing the data to bayesplot, but my hunch is that brms is probably doing the right thing here (we’ll see if I end up being right!).

Gang · November 24, 2020, 1:19am

Thanks, Jonah!

This time I forced the $y-axis to be within the same range for the three plots using

pp_check(fm, nsamples = 200) +
coord_cartesian(lim = c(0, 1000), ylim=c(0, 0.007))

The attached figures look even more different around the 0 area. I have seen subtle differences under a different scenario before, but this time it seems more dramatic. Let me know if you want a testing dataset.

test0NB.pdf (14.8 KB) testNB-hurdle.pdf (15.8 KB) testNB.pdf (13.0 KB)

jonah · November 24, 2020, 2:29am

Thanks for sharing the new plots. Yeah they definitely still look different. The next thing to figure out is whether the issue is in brms or in the bayesplot package that brms is calling to make the plots. Can you try making these plots using bayesplot directly instead of pp_check()? It’s pretty simple:

# assuming y is your outcome variable and fit is your brmsfit object
bayesplot::ppc_dens_overlay(y = y, yrep = posterior_predict(fit, nsamples = 200))

That should produce the same plot as the pp_check default. If the plots still have the same problem then something weird could be happening in the bayesplot function that I need to figure out. Otherwise it’s probably in brms and hopefully Paul can figure it out.

Gang · November 24, 2020, 3:55am

Yes, the same problem persisted with bayesplot. Let me know how I could assist the diagnosis process.

jonah · November 24, 2020, 6:13pm

I think the best chance of figuring this out is if I can reproduce it on my computer. Since it happens just with bayesplot I guess I don’t need to fit the models, I just need the y and yrep used in ppc_dens_overlay. Is it possible for you to share those objects?

Gang · November 24, 2020, 10:41pm

The objects are here

yrep1, yrep2 and yrep3 correspond to the three plots I showed before. Use the following code to reproduce the plots:

dev.new()
bayesplot::ppc_dens_overlay(y = y, yrep = yrep1) + coord_cartesian(xlim = c(0, 1000), ylim=c(0, 0.007))

dev.new()
bayesplot::ppc_dens_overlay(y = y, yrep = yrep2) + coord_cartesian(xlim = c(0, 1000), ylim=c(0, 0.007))

dev.new()
bayesplot::ppc_dens_overlay(y = y, yrep = yrep3) + coord_cartesian(xlim = c(0, 1000), ylim=c(0, 0.007))

Thanks, Jonah!

jonah · November 24, 2020, 11:22pm

Thanks for sharing the objects, that’s really helpful. I think I’ve kind of figured out what’s going on and it doesn’t actually have anything to do with bayesplot or brms but rather with the difference between using ggplot2’s coord_cartesian() vs ggplot2’s xlim() and ylim() functions.

Can you try this?

bayesplot::ppc_dens_overlay(y = y, yrep = yrep1) + 
  xlim(0, 1000) + 
  ylim(0, 0.01)

bayesplot::ppc_dens_overlay(y = y, yrep = yrep2) + 
  xlim(0, 1000) + 
  ylim(0, 0.01)

So when I use xlim() and ylim() instead of using coord_cartesian() the problem you were seeing seems to go away (I would have actually thought it would be the opposite, but ggplot2 is confusing!). Do you see the same thing when you try it?

Gang · November 25, 2020, 12:50am

Yes, that worked out nicely! I have no idea why ggplot2 behaves like this.

Thanks a lot for resolving this, Jonah!

torkar · November 25, 2020, 7:48am

Hi @Gang,

this has actually bitten me a few times before. Using coord_cartesian() zooms the plot and doesn’t change the data. Using xlim() and ylim(), however, comes with a warning:

Be warned that this will remove data outside the limits and this can produce unintended results.

Good that Jonah found what the problem was :)

Gang · November 25, 2020, 1:40pm

Thanks for sharing the information! Have a nice day.

Topic		Replies	Views
Plot doesn't look good from pp_check() in brms Modeling brms	3	1099	January 27, 2023
Negative binomial brms model: assessing posterior predictive check and changing dependent variable units Modeling brms	4	1474	June 24, 2021
Pp_check - quick way to change scale of y_rep variable? brms	1	845	March 21, 2021
Adjust plots General brms	1	330	September 24, 2021
Modelling from brms output brms	1	389	April 22, 2022

Raw data rendered differently by pp_check() in brms

Related topics