Difference between kfold(..., joint = "group") and kfold(..., joint = FALSE) for brms model

Henrik_Singmann · March 16, 2025, 10:06pm

I am trying to understand the difference between kfold(..., joint = "group") and kfold(..., joint = FALSE) for a brms model with a grouping variable. The grouping variable, pid, is both used when fitting the model (as a random-effect grouping variable, |pid) and in the kfold(.., group = "pid") call.

The data are binomial responses from a psychological experiment, where each participant worked on five different conditions (baserates). I want to compare different cognitive models each specified through a custom_family() in brms. Each candidate models defines multiple distributional parameters. For one distributional parameter we only have random intercepts, ~ 1 + (1|p|pid) for the other distributional parameter we estimate one individual-level coefficient per baserate and participant, 0 + baserate + (0 + baserate|p|pid).

My goal is to perform leave-one-participant out cross validation using kfold(). I understand that by specifying kfold(.., group = "pid") each fold excludes the data of exactly one participant. However, I do not understand the difference between the default joint = FALSE and joint = "group".

With joint = FALSE there is no meaningful difference between the two models:

> loo_compare(mod1, mod2)
       elpd_diff se_diff
mod2   0.0       0.0   
mod1  -3.1       4.7

With joint = "group" there is evidence for a difference between the two models:

       elpd_diff se_diff
mod2   0.0       0.0  
mod1 -20.2       9.0

I am trying to understand the difference between the two kfold() calls. My goal is to predict the data of the held-out participant across all baserates together, so I assume joint = "group" is correct.

If someone is interested in all details and code, everything is available here: gumbel-reanalysis/fit-binary-roc.R at main · singmann/gumbel-reanalysis · GitHub
The results presented here are for data set 2 in this set (of quite a few data sets). However, I do not think that is necessary.

Henrik_Singmann · March 17, 2025, 1:10pm

I have checked the brms source code and I think I have figured the difference out.

When using joint = FALSE (the default), the returned pointwise ELPD value is the log_mean_exp() separately for each observations across samples. In my case, each of the five observations per participant is treated individually and gets one log_mean_exp() value.

When using joint = "group", the individual log-likelihood values within each group are first summed up. The returned pointwise ELPD is then the log_mean_exp() across all samples for the summed log-likelihood values.

paul.buerkner · March 17, 2025, 2:04pm

yes, that is indeed correct.

avehtari · March 17, 2025, 5:46pm

I’ll add that

joint=FALSE uses individual observation pointwise log scores, which is useful if you want to assess the predictive performance for new individuals (and if random division in K-fold-CV is used, it is a reasonable approximation for LOO-CV)
joint="group" uses joint log score for a group, which is useful when you want to assess the predictive performance for a new group and take into account the dependency structure between the individuals in a group. See more in Bayesian Comparison of Latent Variable Models: Conditional Versus Marginal Likelihoods | Psychometrika and Cross-Validatory Model Selection for Bayesian Autoregressions with Exogenous Regressors

This indicates that at least in one model the joint predictive distribution for the group has strong dependency. See the illustration in Figure 1 in Cross-Validatory Model Selection for Bayesian Autoregressions with Exogenous Regressors. joint="group" should be as sensitive or more sensitive than joint=FALSE. Happy to see a real life example!

Henrik_Singmann · March 18, 2025, 10:02am

Thanks for these clarifications. This is very helpful.

Just to provide a bit more details: In our case, each group is exactly one individual participant (each participant provided five observations, each in a different experimental condition). Thus, we absolutely expect dependencies between the observations within a group. Hence, joint = "group" is what makes sense. In contrast, making out-of-sample predictions for individual observations within a group (i.e., joint = FALSE) seems not very meaningful or helpful.

Another interesting aspect of our analysis is that we compare the two models across eight different (previously collected) data sets. Of those, for only two do we see a difference between both methods and always in the way that there is evidence for a difference with joint = "group" and not with joint = FALSE.

Topic		Replies	Views
What is the meaning of 'group' in kfold funciton of brms package? brms loo	2	643	October 9, 2019
Specify grouping factor for brms kfold cross-validation brms	6	860	June 7, 2020
Leave-one-group-out CV brms loo	15	5241	November 15, 2020
Cross-validation with group-specific variables Modeling loo , brms	4	425	January 23, 2023
Using Loo 2 for grouped k-fold cv rstanarm loo	14	1913	June 12, 2018

Difference between kfold(..., joint = "group") and kfold(..., joint = FALSE) for brms model

Related topics