Posterior predictive checks in ShinyStan

Hi,

This is probably a very simple problem but I just can’t figure it out. I’ve estimated a multivariate multiple membership model using brms and launched ShinyStan and now want to do the graphical PPchecks. I have difficulties figuring out what information I have to provide under the ‘Select data’ menu. There are three fields

  1. Select y (vector of observations), Object from global environment;
    and 2) Select yrep (posterior predictive replications),
    with a) Parameter/generated quantity from model
    and b) Or object from global environment.

For y (1), I only have the option to select my data or the model I estimated. When I do so, I get the error message “y should be a numeric vector”. So what am I supposed to put here?

Thanks,
Pia

Right now it’s not as flexible as we’d like, but I think if you assign the outcome variable from your data list as an object in the global environment it should let you select it (e.g. y <- data$y). Does that work?

If that doesn’t work then there’s a bug we need to fix. If that’s the case then until it’s fixed I recommend making the PPCs plots with the bayesplot package (it also has a lot more than is currently in shinystan).

Thanks, Jonah. Unfortunately this did not work as I am still getting the same error message when selecting the new “y” object for my outcome variable. I’ll thus use the bayesplot package for now.

Can I just quickly ask a question about the PPC plots. I plotted y and yrep and am a little confused because the y values appear to be unrealistic although I thought they are representing observed values. My scale ranges from 0 to 4 but the plots include values below and above that. Am I misunderstanding something here?

Picture1

Is there any data outside (0, 4)? If not, then it’s just an artifact from kernel density estimation (without boundary correction)

Yeah it’s just an artifact of the kernel density estimation. As far as I know there’s not a way to fix this using ggplot2::stat_density() (which is what is used under the hood), but I could be wrong. It’s definitely something I would fix if possible.

ArviZ does fix this with fft based kde.

For ggplot, this could be fixed by mirroring data around the edges and then manually integrating on the valid range. (Mirror width needs to be large enough). But true, probably not possible with ggplot.

Yeah I was hoping to avoid doing something like what is sounds like you’re doing in arviz. Might end up doing it at some point if ggplot2 doesn’t provide a way to do it soonish (probably won’t).

1 Like

Thanks for your response. I’m a relative novice to R so I have no idea what something like “mirroring data around the edges and then manually integrating on the valid range” means. I have definitely no data outside that range. Can I assume that if the yrep is similar to the y (even if outside the actual range), that the prediction is quite good so that I have a sound model?

That was just a comment for kde thing, not something users should need to do.

Oh good, that’s a relief. So would you say I can trust the prediction is OK if yrep is similar to the y as shown in the graph?

Yeah that plot looks pretty good!