I am trying to understand the difference between kfold(..., joint = "group")
and kfold(..., joint = FALSE)
for a brms
model with a grouping variable. The grouping variable, pid
, is both used when fitting the model (as a random-effect grouping variable, |pid
) and in the kfold(.., group = "pid")
call.
The data are binomial responses from a psychological experiment, where each participant worked on five different conditions (baserate
s). I want to compare different cognitive models each specified through a custom_family()
in brms
. Each candidate models defines multiple distributional parameters. For one distributional parameter we only have random intercepts, ~ 1 + (1|p|pid)
for the other distributional parameter we estimate one individual-level coefficient per baserate and participant, 0 + baserate + (0 + baserate|p|pid)
.
My goal is to perform leave-one-participant out cross validation using kfold()
. I understand that by specifying kfold(.., group = "pid")
each fold excludes the data of exactly one participant. However, I do not understand the difference between the default joint = FALSE
and joint = "group"
.
With joint = FALSE
there is no meaningful difference between the two models:
> loo_compare(mod1, mod2)
elpd_diff se_diff
mod2 0.0 0.0
mod1 -3.1 4.7
With joint = "group"
there is evidence for a difference between the two models:
elpd_diff se_diff
mod2 0.0 0.0
mod1 -20.2 9.0
I am trying to understand the difference between the two kfold()
calls. My goal is to predict the data of the held-out participant across all baserates together, so I assume joint = "group"
is correct.
If someone is interested in all details and code, everything is available here: gumbel-reanalysis/fit-binary-roc.R at main · singmann/gumbel-reanalysis · GitHub
The results presented here are for data set 2 in this set (of quite a few data sets). However, I do not think that is necessary.