Difference between kfold(..., joint = "group") and kfold(..., joint = FALSE) for brms model

I am trying to understand the difference between kfold(..., joint = "group") and kfold(..., joint = FALSE) for a brms model with a grouping variable. The grouping variable, pid, is both used when fitting the model (as a random-effect grouping variable, |pid) and in the kfold(.., group = "pid") call.

The data are binomial responses from a psychological experiment, where each participant worked on five different conditions (baserates). I want to compare different cognitive models each specified through a custom_family() in brms. Each candidate models defines multiple distributional parameters. For one distributional parameter we only have random intercepts, ~ 1 + (1|p|pid) for the other distributional parameter we estimate one individual-level coefficient per baserate and participant, 0 + baserate + (0 + baserate|p|pid).

My goal is to perform leave-one-participant out cross validation using kfold(). I understand that by specifying kfold(.., group = "pid") each fold excludes the data of exactly one participant. However, I do not understand the difference between the default joint = FALSE and joint = "group".

With joint = FALSE there is no meaningful difference between the two models:

> loo_compare(mod1, mod2)
       elpd_diff se_diff
mod2   0.0       0.0   
mod1  -3.1       4.7   

With joint = "group" there is evidence for a difference between the two models:

       elpd_diff se_diff
mod2   0.0       0.0  
mod1 -20.2       9.0  

I am trying to understand the difference between the two kfold() calls. My goal is to predict the data of the held-out participant across all baserates together, so I assume joint = "group" is correct.


If someone is interested in all details and code, everything is available here: gumbel-reanalysis/fit-binary-roc.R at main · singmann/gumbel-reanalysis · GitHub
The results presented here are for data set 2 in this set (of quite a few data sets). However, I do not think that is necessary.

I have checked the brms source code and I think I have figured the difference out.

When using joint = FALSE (the default), the returned pointwise ELPD value is the log_mean_exp() separately for each observations across samples. In my case, each of the five observations per participant is treated individually and gets one log_mean_exp() value.

When using joint = "group", the individual log-likelihood values within each group are first summed up. The returned pointwise ELPD is then the log_mean_exp() across all samples for the summed log-likelihood values.

yes, that is indeed correct.

1 Like

I’ll add that

This indicates that at least in one model the joint predictive distribution for the group has strong dependency. See the illustration in Figure 1 in Cross-Validatory Model Selection for Bayesian Autoregressions with Exogenous Regressors. joint="group" should be as sensitive or more sensitive than joint=FALSE. Happy to see a real life example!

Thanks for these clarifications. This is very helpful.

Just to provide a bit more details: In our case, each group is exactly one individual participant (each participant provided five observations, each in a different experimental condition). Thus, we absolutely expect dependencies between the observations within a group. Hence, joint = "group" is what makes sense. In contrast, making out-of-sample predictions for individual observations within a group (i.e., joint = FALSE) seems not very meaningful or helpful.

Another interesting aspect of our analysis is that we compare the two models across eight different (previously collected) data sets. Of those, for only two do we see a difference between both methods and always in the way that there is evidence for a difference with joint = "group" and not with joint = FALSE.