Plotting by group in brms with pp_check

I’m trying to use pp_check by group. It might be related to this:

Given this data:

df_pupil_data <- structure(list(trial = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41), load = c(2, 
1, 5, 4, 0, 3, 0, 4, 2, 3, 5, 1, 4, 3, 0, 1, 2, 5, 1, 5, 2, 3, 
0, 4, 2, 1, 3, 0, 5, 4, 5, 1, 2, 3, 4, 0, 3, 4, 1, 2, 0), p_size = c(1021, 
951, 1064, 913, 603, 826, 464, 758, 733, 591, 879, 851, 772, 
829, 455, 588, 713, 708, 708, 787, 567, 722, 540, 667, 698, 653, 
576, 626, 573, 726, 752, 727, 625, 683, 687, 636, 614, 655, 557, 
645, 601), c_load = c(0, -1, 3, 2, -2, 1, -2, 2, 0, 1, 3, -1, 
2, 1, -2, -1, 0, 3, -1, 3, 0, 1, -2, 2, 0, -1, 1, -2, 3, 2, 3, 
-1, 0, 1, 2, -2, 1, 2, -1, 0, -2)), row.names = c(NA, -41L), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

And this model:

fit_pupil <- brm(p_size ~ c_load,
                 data = df_pupil_data,
                 family = gaussian(),
                 prior = c(
                     prior(normal(800, 300), class = Intercept),
                     prior(normal(0, 1000), class = sigma),
                     prior(normal(0, 100), class = b)
                 ))

I want to plot posterior predict distributions by load.

I could do it with bayesplot:

yrep <- posterior_predict(fit_pupil, nsamples = 1000)
y <- df_pupil_data$p_size
ppc_violin_grouped(y, yrep,  group= df_pupil_data$load, probs = c(.1,.9), alpha = 0.1, y_draw = "points", y_size = 1.5)

But I couldn’t in brms, all these didn’t work

pp_check(fit_pupil, nsamples = 100, type = "violin_grouped", group = "c_load")
pp_check(fit_pupil, nsamples = 100, type = "violin_grouped", group = "load")
pp_check(fit_pupil, nsamples = 100, type = "violin_grouped", group = "factor(c_load)")
pp_check(fit_pupil, nsamples = 100, type = "violin_grouped", group = "df_pupil_data$c_load")
pp_check(fit_pupil, nsamples = 100, type = "violin_grouped", group = df_pupil_data$c_load)

Please also provide the following information in addition to your question:

  • Operating System: Ubuntu 18.04.1
  • brms Version: 1.10

The problem is that group needs to be a discrete predictor. But your model spec treats c_load as continuous predictor.

df_pupil_data$c_load_f <- factor(df_pupil_data$c_load)
fit_pupil <- brm(p_size ~ c_load_f,
                 data = df_pupil_data,
                 family = gaussian(),
                 prior = c(
                     prior(normal(800, 300), class = Intercept),
                     prior(normal(0, 1000), class = sigma),
                     prior(normal(0, 100), class = b)
                 ))
pp_check(fit_pupil, nsamples = 100, type = "violin_grouped", group = "c_load_f")

works as expected. The error message hints you that you have no valid grouping (but I can see how it might come across as not making sense).

Also a small piece of feedback: “didn’t work” is quite a general phrase. It is usually more useful to be more
specifc “threw this error” / “showed empty plot” / …

Best of luck with your model!

Sorry about the “didn’t work”.
I don’t want to model it as a categorical predictor, I want to have it as continuous. I know it’s not strictly right, but I want to assume that the relationship is linear and to predict loads that are not in the data.

Notice that ppc_violin_grouped doesn’t require me to have the predictor as a factor. But given that the example with c_load_f worked, it might be a limitation of brms.

I will relax the checks associated with the group argument of pp_check for brmsfit objects.

1 Like

Would it be possible to group by something that’s not in the model (but it’s in the data?). Ideally one could add a vector here instead of a string (as ppc_* does).

You might also be interested in monotonic effects (sorry, if you know about this already, but seems like it could get you best of both worlds).

yes, I know. (This is anyway for teaching linear regression, not my own research).
But the problem is that you cannot generalize to a new level, right? If my data has 5 levels of load, I cannot predict what would happen with a 6th level. For that I need to commit to some relationship between load and the DV. AFAIK, you cannot do that with monotonic effects.

1 Like

Yeah, you are correct, I missed that you need to predict for new loads.

It is possible to predict new levels with monotonic effects although not necessarily sensible (I didn’t do much research on that). You do that by adding a new row to your data with the level you want to predict and with the response set to NA. Then you use mi() terms to allow for modelling missing values in the response.

1 Like

I just relaxed the checks on the group argument so that you should now be able to pass any variable in the model.

1 Like

That’s great! But what about in the data? Can you pass a variable that wasn’t used in the model? Can you just pass a vector as ppc_ from bayesplo?? (That would be amazing, but I understand if it’s not your priority)

This wont work for the time being. Only variables used in the model are stored and only those can be used. If you want more flexibility you need to use bayesplot directly.

1 Like