Discrepancy between y and yrep in pp_check: how bad is it?

The hyperlinked this in post #18 is a @betanalpha post. I don’t know if its possible to turn off notifications for that sort of thing…

1 Like

Maybe because I linked to a post or comment of yours? Odd thing is, I never got an email notification for this comment, so didn’t see until now.

Haha! Yes, I think I mentioned that somewhere above. I should have switched to histogram in the last post with the code.

Thanks for the great link!! I think there might be some benefit here simply from a brms user standpoint for those who frequently work with slider scale data. In this case the inflation is not just a nuisance, so one would likely not want to ignore it, and it seems it could be convenient as is zero_one_inflated_beta in brms currently.

1 Like

I don’t think @betanalpha was advocating for ignoring the inflation, but rather was pointing out that in many cases inflation models for otherwise continuous variables decompose into two pieces that don’t need to be fit jointly (they should still both be fit, but they can be fit separately). There are potentially two reasons to fit jointly. One is if the different pieces share parameters (e.g. covariance parameters in a random effect, or even more explicit forms of sharing, like if the linear predictor from one sub-model is used as a covariate in the other). The other is simply for the convenience of post-processing in brms as a single model. But there are also potentially reasons NOT to fit jointly, including the ease of diagnosing problematic posteriors in one sub-model or the other. It’s a modeling decision worth thinking about.

2 Likes

Thanks!
Yes, he writes this, “In other words because the the inflated and non-inflated points are essentially modeled by separate data generating processes the entire model decomposes into a binomial model for the total inflated observations and a baseline model for the non-zero observations.”

But I was referring to this: “In particular if the inflated counts are just a nuisance then we can ignore them entirely and just fit the non-inflated observations directly without any consideration of the inflation!”
I guess I misunderstood.

For slider scale data, I think the reasons to fit jointly that you listed would almost always apply. Usually you have respondents answering multiple questions, in which case, I would think that you would want to model the varying intercepts for respondent as correlated for the continuous and inflation parts in brms like (1|p|id). It is likely that some respondents may be ambivalent and generally pick neutral values or a ‘love it or hate it’ approach and pick all (1’s) or nothing (0’s). It would really depend on the data, but I think the joint approach might be more useful.

Good point!

No, I think you understood perfectly, and I’m the one who read sloppily. If the inflated counts are just nuisance, then you can often ignore them entirely.

I think you’re good to go. To my eye, what the pp_check() plot suggests is there might be other interesting things going on with your data, which might push you in a good direction for the next study.