Compare models with k-fold cv

When comparing models using k-fold cv, what is the good way to compare models?

I read this article


and here the log likelihoods of held out data points after k-fold refitting are saved and the models are compared by: compare( loo (loglik_model1), loo(loglik_model2), loo(loglik_model3) ).

Is this better than doing kfold(loglik_model1), kfold(loglik_model2), kfold(loglik_model3) ?
kfold function basically sum across likelihood, as below.

Do I need to put log likelihood to loo(loglik_model1) instead of just doing kfold(loglik_model1) so that the effective number of parameters is taken into account?

kfold <- function(log_lik_heldout)  {
  library(matrixStats)
  logColMeansExp <- function(x) {
    # should be more stable than log(colMeans(exp(x)))
    S <- nrow(x)
    colLogSumExps(x) - log(S)
  }
  # See equation (20) of @VehtariEtAl2016
  pointwise <-  matrix(logColMeansExp(log_lik_heldout), ncol= 1)
  colnames(pointwise) <- "elpd"
  # See equation (21) of @VehtariEtAl2016
  elpd_kfold <- sum(pointwise)
  se_elpd_kfold <-  sqrt(ncol(log_lik_heldout) * var(pointwise))
  out <- list(
    pointwise = pointwise,
    elpd_kfold = elpd_kfold,
    se_elpd_kfold = se_elpd_kfold)
  #structure(out, class = "loo")
  return(out)
}

Thanks for asking! I only now realized that there is an error in that article!

In the article loo is called also for logliks from k-folf-cross-validation which doesn’t make sense.

Neither loo nor kfold compute effective number of parameters directly and p_eff shown in loo output is just something which is computed afterwards for additional diagnostic value.

Using the function you included, this is the correct thing to do.

For comparison, it is better to use pointwise values. If you form the structure correctly you could use loo_compare function to compute elpd_diff’s and SE of the pointwise difference