When comparing models using k-fold cv, what is the good way to compare models?
I read this article
and here the log likelihoods of held out data points after k-fold refitting are saved and the models are compared by: compare( loo (loglik_model1), loo(loglik_model2), loo(loglik_model3) ).
Is this better than doing kfold(loglik_model1), kfold(loglik_model2), kfold(loglik_model3) ?
kfold function basically sum across likelihood, as below.
Do I need to put log likelihood to loo(loglik_model1) instead of just doing kfold(loglik_model1) so that the effective number of parameters is taken into account?
kfold <- function(log_lik_heldout) {
library(matrixStats)
logColMeansExp <- function(x) {
# should be more stable than log(colMeans(exp(x)))
S <- nrow(x)
colLogSumExps(x) - log(S)
}
# See equation (20) of @VehtariEtAl2016
pointwise <- matrix(logColMeansExp(log_lik_heldout), ncol= 1)
colnames(pointwise) <- "elpd"
# See equation (21) of @VehtariEtAl2016
elpd_kfold <- sum(pointwise)
se_elpd_kfold <- sqrt(ncol(log_lik_heldout) * var(pointwise))
out <- list(
pointwise = pointwise,
elpd_kfold = elpd_kfold,
se_elpd_kfold = se_elpd_kfold)
#structure(out, class = "loo")
return(out)
}