After some more reading on this forum and elsewhere (such as here and here), it seems like it might not be feasible to compute LOO when there are so many observations, as dropping a single observation out of hundreds of thousands is barely perceptible. The first link above has a good discussion on this subject.
Consequently, I have turned my efforts towards brms::kfold()
, which also seems more intuitive to me as there are natural groupings by individuals within my dataset that I can seperate into more or less equal folds. However, I have some new questions regarding its use:
- Am I correct in my understanding that
brms::kfold()
is just another way to approximate the PSIS LOO information criterion? Can I call the result the same thing, just estimated a different way? - I’ve come to realize
brms::loo_subsample()
can pass arguments toloo::loo_subsample()
, of interest are the arguments that can help speed things up, likedraws
,observations
, andcores
. However, it’s not clear from thebrms::kfold()
docs that I can do those same things. From what I can gather, it seems I can pass chains/cores/threads arguments tobrms::brm()
… So what exactly doesfuture_args
accomplish, is it to run the folds in parallel while chains/cores/threads applies to within each fold? Is there a rule fo thumb of the most efficient way to set this up for large models?
Thank you for your time, I know I’ve diverged a bit from the title of the question, so I can post another thread instead… But it seems like a natural progression that other users might end up here looking to solve slow LOO computations and be redirected to kfolds.