Pp_check shows implausible values for y

Pia · November 26, 2019, 9:02am

When I do the posterior predictive check with pp_check, the density plot shows y values that I do not actually have in my data set. My y values all fall between 0 and 4 but the plot shows values for both y and yrep that are below 0. Where do these wrong y values come from and does this tell me there is something wrong with my model?
Thanks

paul.buerkner · November 26, 2019, 9:04am

This is because of the smoothing done during plotting and is just a visual effect. I would recommend using another pp check type for that purpose.

Pia · November 26, 2019, 9:05am

Great, thanks for the quick help! You mean something like a histogram?

paul.buerkner · November 26, 2019, 9:16am

For example yes.

erognli · November 26, 2019, 9:16am

ppc_bars() might be what you are after. It’s an ordinal model, right?

(Fast, but not faster than @paul.buerkner 😊)

Pia · November 26, 2019, 9:29am

Thanks OK. I actually used skew_normal because it’s a composite variable.

erognli · November 26, 2019, 11:14am

Then you’ll probably find that the model does predict values outside the range of the variable, as the distribution you’ve assigned it in the model probably has a larger range than the variable actually has. That might or might not be a problem, depending on what you’re doing.

Pia · November 26, 2019, 11:24am

There is a small amount of predictions that go below 0 and above 4. But only a few, so I thought it would be OK. But then I’m also new to this…

I tried a truncated model before but that did not work well.

erognli · November 26, 2019, 11:31am

Whether it’s OK or not is really up to you to decide, as the modeller, as long as you don’t hide your assumptions when communicating your analysis (in my humble opinion). Is the skew-normal distribution a useful approximation or something that might bias any inference you want to make? Is the variable truly continuous between 0 and 4? You said it’s a composite variable - is it a sum of four variables that range from 0-1?

Pia · November 26, 2019, 11:36am

It’s the average of four items that have a likert-type scale with five response options (e.g., completely disagree to completely agree). All my variables regard motivational beliefs/attitudes. I thought it is a fair assumption that the true values for the concept have a skew-normal distribution.

erognli · November 26, 2019, 11:42am

Right! And the parameter you are interested in is the distribution of the latent trait assumed to be measured by these four scale items? If so, I’d consider fitting some kind of IRT model in stead, like a graded response model or rating scale model, which would be a model that is closer to your actual data? I think there is a paper by @paul.buerkner giving a tutorial for fitting such models with brms.

Edit: Yes there was: https://arxiv.org/abs/1905.09501

Pia · November 26, 2019, 11:46am

And the parameter you are interested in is the distribution of the latent trait assumed to be measured by these four scale items?

Yes exactly.

Doing bayesian analysis and the kind of model I am doing now (two-level cross-classified multiple membership model) has already been such a massive learning curve for me that I am a little overwhelmed by all the options out there. I have looked at the tutorial from Paul but because I have never used IRT, thought I would stick to what I know if possible. But if the skew-normal is inappropriate, I might have to reconsider this

erognli · November 26, 2019, 11:56am

Oh, I know exactly what you mean about massive learning curves. I can relate to overwhelmedness!

I’m not the one to tell you what is and isn’t appropriate. All models are approximations anyway. I’m not very familiar with brms, but might it be easier to extract and modify the Stan code directly to insert a graded response model connecting the latent trait parameter (which I assume you are using in your cross-classified model) and the observed responses to the questionnaire?

I guess an important question is how much you trust the measurement properties of your likert-items? If not very much, there might be considerable gains to be made by modelling those measurement properties. But if you do, and believe the skew-normal is a reasonable approximation, perhaps your best course of action is to stick to your plan and write convincingly about why the approximation is good enough. You’re the one who knows your data, your model and your research question.

Pia · November 26, 2019, 11:59am

I’ve never used Stan code but always used brms… but the more confident I get with this, the more I can venture into new territory :). I will definitely give it another thought! Thanks for helping me with this!

Topic		Replies	Views
Choosing a sampling distribution for left skewed data brms	15	1640	March 20, 2024
Identify response variable probability distribution: on the use of pp_check Modeling techniques , ecology , posterior-predictive	1	425	April 22, 2021
Gaussian vs. skew-normal model selection brms loo	19	7386	August 26, 2019
Posterior predictive checks - kurtosis and skew brms	18	4044	October 9, 2019
Posterior predictive check looks weird - what can I do? Modeling posterior-predictive , brms	16	4141	April 24, 2024

Pp_check shows implausible values for y

Related topics