How to speed up `brms::loo_subsample()` for large models

After some more reading on this forum and elsewhere (such as here and here), it seems like it might not be feasible to compute LOO when there are so many observations, as dropping a single observation out of hundreds of thousands is barely perceptible. The first link above has a good discussion on this subject.

Consequently, I have turned my efforts towards brms::kfold(), which also seems more intuitive to me as there are natural groupings by individuals within my dataset that I can seperate into more or less equal folds. However, I have some new questions regarding its use:

  1. Am I correct in my understanding that brms::kfold() is just another way to approximate the PSIS LOO information criterion? Can I call the result the same thing, just estimated a different way?
  2. I’ve come to realize brms::loo_subsample() can pass arguments to loo::loo_subsample(), of interest are the arguments that can help speed things up, like draws, observations, and cores. However, it’s not clear from the brms::kfold() docs that I can do those same things. From what I can gather, it seems I can pass chains/cores/threads arguments to brms::brm()… So what exactly does future_args accomplish, is it to run the folds in parallel while chains/cores/threads applies to within each fold? Is there a rule fo thumb of the most efficient way to set this up for large models?

Thank you for your time, I know I’ve diverged a bit from the title of the question, so I can post another thread instead… But it seems like a natural progression that other users might end up here looking to solve slow LOO computations and be redirected to kfolds.