Issues fitting a multidimensional IRT model with K-fold cross-validation

Thank you, sir! Yes, I agree that the correct approach is just to get the log_col_means that you mentioned. Actually those are in fact the elpd_kfold values that I present at the top of the post. It’s understandable that you might have missed that beause despite the detail in the post, I still forgot to include some code. In the R code listing,do_K_fold_cross_validation_memory_friendly() calls the function kfold() which was not included with my first post. Here it is:

# This is from https://www.r-bloggers.com/k-fold-cross-validation-in-stan/
#compute ELPD
kfold <- function(log_lik_heldout)  {
  library(matrixStats)
  logColMeansExp <- function(x) {
    # should be more stable than log(colMeans(exp(x)))
    S <- nrow(x)
    colLogSumExps(x) - log(S)
  }
  # See equation (20) of @VehtariEtAl2016
  pointwise <-  matrix(logColMeansExp(log_lik_heldout), ncol= 1)
  colnames(pointwise) <- "elpd"
  # See equation (21) of @VehtariEtAl2016
  elpd_kfold <- sum(pointwise)
  se_elpd_kfold <-  sqrt(ncol(log_lik_heldout) * var(pointwise))
  out <- list(
    pointwise = pointwise,
    elpd_kfold = elpd_kfold,
    se_elpd_kfold = se_elpd_kfold)
  #structure(out, class = "loo")
  return(out)
}

The reason for my get_loo_from_kfold() function is just to attempt to reproduce the loo results using the log likelihoods from the k-fold cross-validation. I recognize that using the Pareto Scaled Importance Sampling is not necessary in that case, but I figured that if I could at least come close to replicating the results from my models that did not use k-fold cross-validation, then I would be more confident that I had not simply written a bug when creating the folds or implementing my Stan model for k-fold cross-validation. So mathematically I agree that this was an unnecessary approach, but it seemed like a reasonable check to do. (Actually I was also initially misled by this blog, which runs loo on the likelihoods returned by k-fold cross-validation for the purpose of model comparison. So it appears that this confusion may still be prevalent in the Stan community.)

It looks like in my first post I also omitted the very last few lines of code that actually display the elpd_kfold values. Here’s what that looks like, for clarity:

load(file="~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/3d_k_fold_elpd.dta")
load(file="~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/2d_k_fold_elpd.dta")

kfold18$kk$elpd_kfold
# [1] -22758.8 [compare to top of post]
kfold19$kk$elpd_kfold
# [1] -22492.77

kfold18$kk$se_elpd_kfold
# 107.6672
kfold19$kk$se_elpd_kfold
# 105.9595

# I'm artificially and misleadingly assigning the class "loo" to these results so that I can 
# take functions from loo_compare.R in the loo package, only modify a few lines of code, and use 
# them to compare the results for the kfold analyses. It's a complete hack, written for expediency.
kfold18_struct <- structure(kfold18$kk, class="loo") 
kfold19_struct <- structure(kfold19$kk, class="loo")
compare_kfold(kfold18_struct, kfold19_struct)
# elpd_diff        se 
# 266.0           19.3

# The rest of this listing is all code from loo_compare.R, except for just a few lines, and I've 
# changed the function names...
compare_kfold <- function(..., x = list()) {
  dots <- list(...)
  if (length(dots)) {
    if (length(x)) {
      stop("If 'x' is specified then '...' should not be specified.",
           call. = FALSE)
    }
    nms <- as.character(match.call(expand.dots = TRUE))[-1L]
  } else {
    if (!is.list(x) || !length(x)) {
      stop("'x' must be a list.", call. = FALSE)
    }
    dots <- x
    nms <- names(dots)
    if (!length(nms)) {
      nms <- paste0("model", seq_along(dots))
    }
  }

  #if (!all(sapply(dots, is.loo))) {
  #  stop("All inputs should have class 'loo'.")
  #}
  if (length(dots) <= 1L) {
    stop("'compare' requires at least two models.")
  } else if (length(dots) == 2L) {
    loo1 <- dots[[1]]
    loo2 <- dots[[2]]
    comp <- compare_two_models_kfold(loo1, loo2)
    return(comp)
  } else {
    Ns <- sapply(dots, function(x) nrow(x$pointwise))
    if (!all(Ns == Ns[1L])) {
      stop("Not all models have the same number of data points.", call. = FALSE)
    }

    x <- sapply(dots, function(x) {
      est <- x$estimates
      setNames(c(est), nm = c(rownames(est), paste0("se_", rownames(est))) )
    })
    colnames(x) <- nms
    rnms <- rownames(x)
    comp <- x
    ord <- order(x[grep("^elpd", rnms), ], decreasing = TRUE)
    comp <- t(comp)[ord, ]
    patts <- c("elpd", "p_", "^waic$|^looic$", "^se_waic$|^se_looic$")
    col_ord <- unlist(sapply(patts, function(p) grep(p, colnames(comp))),
                      use.names = FALSE)
    comp <- comp[, col_ord]

    # compute elpd_diff and se_elpd_diff relative to best model
    rnms <- rownames(comp)
    diffs <- mapply(elpd_diffs, dots[ord[1]], dots[ord])
    elpd_diff <- apply(diffs, 2, sum)
    se_diff <- apply(diffs, 2, se_elpd_diff)
    comp <- cbind(elpd_diff = elpd_diff, se_diff = se_diff, comp)
    rownames(comp) <- rnms
    class(comp) <- c("compare.loo", class(comp))
    comp
  }
}

compare_two_models_kfold <- function(loo_a, loo_b, return = c("elpd_diff", "se"), check_dims = TRUE) {
  if (check_dims) {
    #if (dim(loo_a)[2] != dim(loo_b)[2]) {
    if (dim(loo_a[[1]])[1]!=dim(loo_b[[1]])[1]){
      stop(paste("Models don't have the same number of data points.",
                 "\nFound N_1 =", dim(loo_a[[1]])[1], "and N_2 =", dim(loo_b[[1]])[1]),
           call. = FALSE)
    }
  }

  diffs <- elpd_diffs_kfold(loo_a, loo_b)
  comp <- c(elpd_diff = sum(diffs), se = se_elpd_diff_kfold(diffs))
  structure(comp, class = "compare.loo")
}

elpd_diffs_kfold <- function(loo_a, loo_b) {
  pt_a <- loo_a$pointwise
  pt_b <- loo_b$pointwise
  elpd <- grep("^elpd", colnames(pt_a))
  pt_b[, elpd] - pt_a[, elpd]
}
se_elpd_diff_kfold <- function(diffs) {
  N <- length(diffs)
  sqrt(N) * sd(diffs)
}

Thank you again for your time, sir!

And this question is for anyone: in a related post, Ben Goodrich mentioned that “if you are getting variance on the order of days, then your Markov chain is not efficient enough to be doing anything reliable with the draws anyways.” I feel as though I do not know enough about how No U-Turn Hamiltonian Monte Carlo explores the posterior distribution to understand when I can draw inferences from the results and when I cannot – I may have mistakenly thought that as long as the standard convergence diagnostics look okay (checking autocorrelations and making sure the chains mix), I am safe and the difference in running times is not relevant. What’s the proper literature to refer to?

Thanks,

Richard