Pp_check shows implausible values for y

When I do the posterior predictive check with pp_check, the density plot shows y values that I do not actually have in my data set. My y values all fall between 0 and 4 but the plot shows values for both y and yrep that are below 0. Where do these wrong y values come from and does this tell me there is something wrong with my model?
Thanks

This is because of the smoothing done during plotting and is just a visual effect. I would recommend using another pp check type for that purpose.

Great, thanks for the quick help! You mean something like a histogram?

For example yes.

ppc_bars() might be what you are after. It’s an ordinal model, right?

(Fast, but not faster than @paul.buerkner 😊)

Thanks OK. I actually used skew_normal because it’s a composite variable.

Then you’ll probably find that the model does predict values outside the range of the variable, as the distribution you’ve assigned it in the model probably has a larger range than the variable actually has. That might or might not be a problem, depending on what you’re doing.

1 Like

There is a small amount of predictions that go below 0 and above 4. But only a few, so I thought it would be OK. But then I’m also new to this…

I tried a truncated model before but that did not work well.

1 Like

Whether it’s OK or not is really up to you to decide, as the modeller, as long as you don’t hide your assumptions when communicating your analysis (in my humble opinion). Is the skew-normal distribution a useful approximation or something that might bias any inference you want to make? Is the variable truly continuous between 0 and 4? You said it’s a composite variable - is it a sum of four variables that range from 0-1?

1 Like

It’s the average of four items that have a likert-type scale with five response options (e.g., completely disagree to completely agree). All my variables regard motivational beliefs/attitudes. I thought it is a fair assumption that the true values for the concept have a skew-normal distribution.

Right! And the parameter you are interested in is the distribution of the latent trait assumed to be measured by these four scale items? If so, I’d consider fitting some kind of IRT model in stead, like a graded response model or rating scale model, which would be a model that is closer to your actual data? I think there is a paper by @paul.buerkner giving a tutorial for fitting such models with brms.

Edit: Yes there was: https://arxiv.org/abs/1905.09501

1 Like

And the parameter you are interested in is the distribution of the latent trait assumed to be measured by these four scale items?

Yes exactly.

Doing bayesian analysis and the kind of model I am doing now (two-level cross-classified multiple membership model) has already been such a massive learning curve for me that I am a little overwhelmed by all the options out there. I have looked at the tutorial from Paul but because I have never used IRT, thought I would stick to what I know if possible. But if the skew-normal is inappropriate, I might have to reconsider this

1 Like

Oh, I know exactly what you mean about massive learning curves. I can relate to overwhelmedness!

I’m not the one to tell you what is and isn’t appropriate. All models are approximations anyway. I’m not very familiar with brms, but might it be easier to extract and modify the Stan code directly to insert a graded response model connecting the latent trait parameter (which I assume you are using in your cross-classified model) and the observed responses to the questionnaire?

I guess an important question is how much you trust the measurement properties of your likert-items? If not very much, there might be considerable gains to be made by modelling those measurement properties. But if you do, and believe the skew-normal is a reasonable approximation, perhaps your best course of action is to stick to your plan and write convincingly about why the approximation is good enough. You’re the one who knows your data, your model and your research question.

1 Like

I’ve never used Stan code but always used brms… but the more confident I get with this, the more I can venture into new territory :). I will definitely give it another thought! Thanks for helping me with this!

1 Like