Dear Stan users:

Hi, I’m picking up a project that I had to shelve for a few months. I’m fitting multidimensional Item Response Theory models, and am having trouble with large Pareto k values during leave-one-out cross-validation. This problem occurs even when I am using exactly the right model because I generated the data myself using an IRT model with the same number of dimensions as the model I am fitting to the data. The reason, I believe, is that by random chance some respondents will answer a difficult test item correctly despite having low ability, and IRT models have so many parameters that a few data points are almost guaranteed to show up as outliers, with high Pareto k values. Additionally, I am using priors to identify the models (setting the origin and the units and orientation of the axes in the coordinate system describing the model, as per Reckase (2009)), and it is possible (as I am new to multidimensional IRT models) that while my choice of priors works for model identification, the priors may still be weak enough that the outliers have enough room to work their mischief, leading to the high Pareto k values. I probably have more research to do concerning how other researchers have chosen specific priors for specific models, and my gut tells me that at least half of this sort of knowledge just comes with experience.

Dr. Vehtari recommended K-fold cross-validation (in particular, 10-fold cross-validation) as a work-around for the large Pareto k values encountered with LOO, so I have gone ahead and implemented K-fold cross-validation for multidimensional IRT models. Those models appear to be working, generating apparently valid results (as I show below). It is a small victory for me, as I am new to both Stan and to multidimensional IRT models and the associated issues with prior selection and label switching within and between chains, issues not encountered with unidimensional models. The problem is, my three-dimensional IRT model with K-fold cross-validation appears to be fitting two-dimensional simulated data better than the two-dimensional K-fold cross-validation model does:

```
########################
# K-fold cross-validation comparison
#
# 2-d model 3-d model
# elpd_kfold -22758.8 -22492.77
# se_elpd_kfold 107.67 105.96
#
# [pseudocode for a] custom function comparing elpd results for k-fold cross-validation models
# that I hacked together from functions in loo_compare.R in the 'loo' package
#
#> compare_kfold([2-d model], [3-d model])
# elpd_diff se
# 266.0 19.3
#
########################
```

Because I was curious and as a check on my coding, I went ahead and calculated LOO cross-validation statistics using the log-likelihood for the holdout data from each of the K folds (I have placed my code calculating that log-likelihood at the end of this post). The resulting LOO comparisons replicate results that I had obtained for my models that did not go through the extra step of splitting my data into 10 folds and combining the likelihoods for the holdout data of each of the resulting models. That, I believe, is at least an oblique confirmation that my implementation of the k-fold cross-validation models is working. Here are those results, but note that I don’t completely trust the LOO results anyway because of the high Pareto k values:

```
########################
# LOO comparison (using likelihood assembled from k-fold cross-validation models)
# elpd_diff se_diff elpd_loo p_loo looic
# 2-d model 0.0 0.0 -24975.9 2217.1 49951.9
# 3-d model -84.9 26.6 -25060.8 2568.0 50121.6
#
# LOO comparison (using likelihoods from individual models)
# elpd_diff se_diff elpd_loo p_loo looic
# 2-d model 0.0 0.0 -24581.8 1971.3 49163.7
# 3-d model -23.6 3.8 -24605.5 2291.3 49210.9
# 1-d model -586.6 33.5 -25168.5 1231.1 50336.9
########################
```

(I accidentally ran the 1-d K-fold model using the Stan file that does not implement K-fold cross-validation, so I’ll have to re-run that model and could report the results, but my question is about the discrepancy concerning the two-dimensional and three-dimensional models.)

The LOO results in both cases show that the two-dimensional model fits the data best, beating the three-dimensional model by a close but statistically significant margin. That would be great, except that I just can’t trust those results when the Pareto k values are large. Here are those values for the LOO statistics computed from the K-fold cross-validation models:

```
> loo18 [2-d model]
Computed from 1500 by 51300 log-likelihood matrix
Estimate SE
elpd_loo -24975.9 123.6
p_loo 2217.1 21.2
looic 49951.9 247.2
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 51036 99.5% 165
(0.5, 0.7] (ok) 249 0.5% 99
(0.7, 1] (bad) 15 0.0% 33
(1, Inf) (very bad) 0 0.0% <NA>
See help('pareto-k-diagnostic') for details.
> loo19 [3-d model]
Computed from 1500 by 51300 log-likelihood matrix
Estimate SE
elpd_loo -25060.8 124.0
p_loo 2568.0 23.6
looic 50121.6 248.0
------
Monte Carlo SE of elpd_loo is NA.
Pareto k diagnostic values:
Count Pct. Min. n_eff
(-Inf, 0.5] (good) 50866 99.2% 150
(0.5, 0.7] (ok) 412 0.8% 89
(0.7, 1] (bad) 20 0.0% 56
(1, Inf) (very bad) 2 0.0% 12
See help('pareto-k-diagnostic') for details.
```

These models, using simulated data, are already a best-case scenario in which relatively few observations are associated with high Pareto k values. Actual data exhibits more skew, with a few respondents with low ability improbably answering very difficult items correctly.

In short, in all appearances I appear to not have a bug in my code implementing the K-fold cross-validation models, but using that approach instead of LOO cross-validation would lead me to select a model that is inconsistent with the actual dimensionality of the data, whereas LOO would lead me to choose the correct one!

For these models, in order for me to be able to get a statistically significant difference between the two- and three-dimensional models, I’m working with about 1,400 simulated respondents responding to about 40 simulated test items. On my four-core processor it takes a number of hours to run a single chain, which translates to about a week to do 10-fold cross-validation with three chains per fold. Tweaking my model parameters and re-fitting a model is therefore costly, which is why I am posting. I’m not sure how to proceed; one hypothesis that I have is that I might need to run the model with a larger simulated data set, because when I run the model on 90% of the data I the signal is slightly weaker, such that the 3-dimensional model ends up fitting more of the noise and therefore out-performing the 2-dimensional model. But I don’t like the idea of running a model for two weeks just in case that a hunch is correct. A second possibility is simply that I have a bug in my code somewhere, but my replication of the LOO results suggests that it’s not so.

Previously, I had to deal with the issues of label switching between chains and, for 5-dimensional models and higher, within chains as well. I wonder if perhaps something is going wrong to that effect during cross-validation, such that when I combine the likelihoods for models across the multiple folds, I am accidentally combining results for chains that have different labels as if they had the same label, leading to a large amount of noise that the three-dimensional model is fitting better than the two-dimensional model simply because it has more parameters. But if that is so, then why do my LOO results for the models using 10-fold cross-validation match those for the models run without it?

One odd behavior that I’ve observed is that the two-dimensional model in many cases takes much longer to finish running than the three-dimensional model. The variance in the running time for each chain is larger, such that it’s really just a few models that take awhile to complete (ranging from three hours to as many as three days for a single chain), for a total of nine days using four cores of a four-core CPU. With the three-dimensional model, the running time was always much shorter (about five hours per chain, such that thirty chains on three CPUs finished in about two days). I thought at first that for the two-dimensional model, using all four cores was slowing the computation due to a large number of context switches generated by the operating system that was fighting for computation time – but repeating the effort with three cores also is producing a high variance in the time for a chain to finish. I don’t know what could be producing this behavior.

I’m not sure how to proceed. I definitely need the model comparisons to work properly with the simulated data before I would be able to trust my conclusions with my actual data.

Below is my code for the one-dimensional and multi-dimensional Stan models using k-fold cross-validation, as well as the code that creates the folds and that computes the log-likelihood. Per the forum guidelines, at the end I’m also including the code that simulates the data and that actually runs the models, but as a fair warning these models took 11 days to run and 23Gb of storage space! If anyone has any ideas about how to proceed, I’d greatly appreciate it as it could take quite some time for me to deduce what is happening here on my own.

A last point: to deal with the skew in my actual data, I’ve written a (definitely suboptimal) heuristic to separate all responses into two groups for the purpose of stratified assignment of observations to folds using the Caret package. For my first analysis with the simulated data (shown here), assignment to each group was random, and each group contained half the data. That approach led to the three-dimensional model fitting best to the two-dimensional data, but I doubt that the way I was doing the stratification would explain what is going on. Nonetheless, the second round of analyses currently underway uses a different heuristic, which is designed to take the roughly 20% of observations that are least likely to occur (i.e. the source of the skew) and place them into one group, such that these outliers are evenly distributed across all folds. The simulated data does not exhibit any skew, so this detail should be irrelevant, but I’m mentioning that for the purpose of completeness. The second round should be complete in five days or so.

Thank you very much for your time! I hope this post might help someone else dealing with this type of problem for k-fold cross-validation with data-hungry models.

Implementation details concerning the k-fold cross-validation: I’m using OpenMP for parallelization. As it stands, each core computes 10 out of 30 chains, then stops. It would be great to know how to get the core(s) that finish first to run the Stan models for additional chains, because currently at the end I am waiting for one straggling core to handle two or three models in serial. But that’s a separate thread.

Best,

Richard

```
#### STAN MODEL
## Stan file for the multidimensional IRT model with K-fold cross-validation:
```

```
// see Reckase(2009) and https://rfarouni.github.io/assets/projects/BayesianMIRT/BayesianMIRT.html
// This uses a (0,1) normal prior on the alphas to identify their scale.
data{
int<lower=1> n_respondents;
int<lower=1> n_items;
int<lower=0,upper=1> Y[n_respondents,n_items];
int<lower=1> D; // number of dimensions
int<lower=0,upper=1> holdout[n_respondents,n_items];
}
transformed data{
cov_matrix[D] Sigma;
row_vector [D] mu_theta = rep_row_vector(0.0, D);
Sigma = rep_matrix(0.0, D, D);
for(i in 1:D){
Sigma[i,i] = 1.0;
} // = [ [1, 0, 0], [0, 1, 0], [0, 0, 1] ]; // Identity matrix as per Reckase (2009) and Chun Wang (2015)
}
parameters{
vector [D] theta [n_respondents];
matrix<lower=0>[n_items,D] alpha;
real mu_alpha;
vector[n_items] beta;
real<lower=0> sigma_alpha;
real<lower=0> sigma_beta;
real mu_beta;
}
model{
// hyperpriors
sigma_alpha ~ cauchy(0,5); // RJC: could allow the prior mean for alpha to vary as well
mu_alpha ~ normal(0,4); // weak but not uninformative
sigma_beta ~ cauchy(0,5); // weak but not uninformative
mu_beta ~ normal(0.0, 4); // weak but not uninformative
// priors
for(i in 1:n_respondents){
theta[i] ~ multi_normal(mu_theta, Sigma); // Reckase 2009
}
beta ~ normal(mu_beta,sigma_beta);
to_vector(alpha) ~ lognormal(mu_alpha, sigma_alpha);
// likelihood
// More efficient to write loops column by column as matrices are stored internally in column-major order (Stan manual p. 327)
//int total_data_counter=0;
//int sample_data_counter=0;
for(j in 1:n_items){
for(i in 1:n_respondents){
if(holdout[i,j]==0){
real p;
p = inv_logit(row(alpha,j)*theta[i]+beta[j]);
target += bernoulli_lpmf(Y[i,j] | p);
//sample_data_counter += 1;
}
//total_data_counter += 1;
// print("alpha[j] = ", alpha[j]);
// print("theta[i] = ", theta[i]);
}
}
//print("sample_data_counter = ", sample_data_counter); // for debugging
//print("total_data_counter = ", total_data_counter); // for debugging
}
generated quantities{
vector[n_items] log_lik[n_respondents];
vector[n_items] pY[n_respondents];
for(j in 1:n_items){
for(i in 1:n_respondents){
real p;
pY[i,j] = inv_logit(row(alpha,j)*theta[i]+beta[j]);
log_lik[i,j] = bernoulli_lpmf(Y[i,j] | pY[i,j]);
}
}
}
```

```
#### R FUNCTIONS
# Simple function to take the results from a K-fold cross-validation and return the loo value.
# Of course, in practice the whole point for doing a K-fold cross-validation is that you can't calculate loo
# directly. The only point of this function is for checking that the results from both K-fold and
# loo are consistent for models for which loo would work. It's a check that my code works.
get_loo_from_kfold <- function(kfold){
log_lik <- kfold$ee_for_loo
r_eff <- relative_eff(exp(log_lik))
result <- loo(log_lik, r_eff=r_eff)
return(result)
}
## This function takes in a data set (for input to a normal Stan object),
## the Stan file that normally gets run, a number of folds (default = 10),
## and a variable with levels along which we'll do the stratification using caret.
## It generates the folds, runs Stan K*nchains times, gets the corresponding Stan objects,
## and extracts the log likelihood. It returns a single matrix of log-likelihoods for all the hold-out data.
## I'll then need to run kfold() on the results.
## One assumption that I make is that the data I pass in has the actual outcomes stored in data$Y.
do_K_fold_cross_validation_memory_friendly <- function(data,
stan_file,
save_file_stem,
do_stratification=TRUE,
n_folds=10,
n_chains=3,
n_cores=3,
merge_chains=FALSE,
iter=2000,
control=list(max_treedepth = 15),
desired_stratification_percentage=0.015,
truncation_percentage=0.02,
debug=TRUE)
{
library(rstan)
rstan_options(auto_write = TRUE)
library(pbmcapply)
library(loo)
library(caret)
K <- n_folds
## TODO: Make this its own function. (Get holdout data in flattened and unflattened formats)
flattened.data <- as.vector(as.matrix(data$Y))
if(do_stratification){
ones_and_twos <- get_group_membership(data,
desired_group_1_percentage=desired_stratification_percentage,
truncation_percentage=truncation_percentage,
debug=TRUE)
}else{
ones_and_twos <- rep(1, length(flattened.data))
ones_and_twos[((length(flattened.data)/2)+1):length(flattened.data)] <- 2
}
folds <- createFolds(ones_and_twos, n_folds, list=FALSE)
holdout_K_from_caret <- list()
for(k in 1:K)
{
holdout_vector <- rep(0,nrow(data$Y)*ncol(data$Y))
holdout_vector[which(folds==k)] <- 1
#holdout_vector <- flattened.data[folds[[k]]]
holdout_K_from_caret[[k]] <- unflatten(holdout_vector,nrow=nrow(data$Y),ncol=ncol(data$Y))
#unflatten(holdout_vector,nrow=nrow(data_irt$Y),ncol=ncol(data_irt$Y))
}
holdout_K_from_caret_flattened <- list()
for(i in 1:K){
holdout_K_from_caret_flattened[[i]] <- flatten(holdout_K_from_caret[[i]])
}
#run the functions
I <- dim(data)[1]
J <- dim(data)[2]
#data_m <- list(n_respondents=I,n_items=J,Y=data)
data_m <- data
#create a list of data list
data_l <- rep(list(data),K)
#add the holdout index to it
#for(i in 1:10) data_l[[i]]$holdout <- holdout_10[[i]]
for(i in 1:K) data_l[[i]]$holdout <- holdout_K_from_caret[[i]]
cat("In do_K_fold_cross_validation: Running K memory-friendly Stan models...\n")
stan_kfold_memory_friendly(file=stan_file,
save_file_stem=save_file_stem,
data_l,
chains=n_chains,
cores=n_cores,
iter=iter,
control=control)
# This doesn't return a list of Stan fit objects. So I want to pass in the
# save file stem to extract_log_lik_K.
cat("In do_K_fold_cross_validation: extracting log-likelihood...\n")
ee_for_loo <- extract_log_lik_K_memory_friendly(save_file_stem=save_file_stem,
n_folds=K,
list_of_holdout=holdout_K_from_caret_flattened,
merge_chains=merge_chains)
ee_for_kfold <- extract_log_lik_K_memory_friendly(save_file_stem=save_file_stem,
n_folds=K,
list_of_holdout=holdout_K_from_caret_flattened)
#r_eff <- relative_eff(exp(ee))
cat("In do_K_fold_cross_validation: calculating ELPD and standard error...\n")
kk <- kfold(ee_for_kfold)
result <- list()
result$kk <- kk
result$ee_for_loo <- ee_for_loo
result$ee_for_kfold <- ee_for_kfold
return(result)
}
# Adapted from a function https://www.r-bloggers.com/k-fold-cross-validation-in-stan/ to use less memory
# all at once
extract_log_lik_K_memory_friendly <- function(save_file_stem, n_folds, list_of_holdout, ...){
require(loo)
K <- n_folds
#list_of_log_liks <- plyr::llply(1:K, function(k){
# extract_log_lik(list_of_stanfits[[k]],...)
#})
load(paste(save_file_stem,"k=1",sep="")) # "merged_model"
log_lik_1 <- extract_log_lik(merged_model,...)
# A little inefficient -- I could probably save one read with clever coding -- but fine for now
# `log_lik_heldout` will include the loglike of all the held out data of all the folds.
# We define `log_lik_heldout` as a (samples x N_obs) matrix
# (similar to each log_lik matrix)
log_lik_heldout <- log_lik_1 * NA
for(k in 1:K){
load(paste(save_file_stem,"k=1",sep="")) # "merged_model"
log_lik <- extract_log_lik(merged_model,...)
samples <- dim(log_lik)[1]
N_obs <- dim(log_lik)[2]
# This is a matrix with the same size as log_lik_heldout
# with 1 if the data was held out in the fold k
heldout <- matrix(rep(list_of_holdout[[k]], each = samples), nrow = samples)
# Sanity check that the previous log_lik is not being overwritten:
if(any(!is.na(log_lik_heldout[heldout==1]))){
warning("Heldout log_lik has been overwritten!!!!")
}
# We save here the log_lik of the fold k in the matrix:
log_lik_heldout[heldout==1] <- log_lik[heldout==1]
rm(merged_model)
gc()
}
return(log_lik_heldout)
}
## Revised version of stan_kfold that isn't as efficient, but which isn't as much of a memory hog.
## TODO: Error handling (low priority)
stan_kfold_memory_friendly <- function(file, save_file_stem, list_of_datas, chains, cores,iter,control,...){
library(pbmcapply)
badRhat <- 1.1 # don't know why we need this?
n_fold <- length(list_of_datas)
model <- stan_model(file=file)
generate_sample_chain <- function(i){
# Fold number:
k <- ceiling(i / chains)
s <- sampling(model, data = list_of_datas[[k]],
chains = 1, chain_id = i,
iter=iter,
control=control)
save(s, file=paste(save_file_stem,"i=",i,"_k=",k,sep=""))
rm(s)
gc()
return(NULL)
}
#for(i in 1:(n_fold*chains)){
# s <- generate_sample_chain(i)
# rm(s)
# gc() # Freeing up memory
#}
# First parallelize all chains:
sflist <-
mclapply(1:(n_fold*chains), mc.cores = cores,
generate_sample_chain)
# The sflist object is extraneous here.
# NOTE: RStudio will mask the output of the individual calls to Stan().
# If I want to see that output, run everything directly from the console!
# It will be a bit of a mess to read, but provides finer-grain information
# about the progress of parallelized model fitting for really big models.
# Then merge the K * chains to create K stanfits, but save them to disk instead of returning in a list:
for(k in 1:n_fold){
sflist <- list()
for(i in 1:chains){
load(file=paste(save_file_stem,"i=",(chains*(k-1)+i),"_k=",k,sep=""))
sflist[[i]] <- s
rm(s)
gc()
}
inchains <- 1:chains
# Merge `chains` of each fold
merged_model <- sflist2stanfit(sflist[inchains])
save(merged_model, file=paste(save_file_stem,"k=",k,sep=""))
# Delete the intermediate files I just created
for(i in 1:chains){
filename <- paste(save_file_stem,"i=",(chains*(k-1)+i),"_k=",k,sep="")
if(file.exists(filename)){
file.remove(filename)
}
}
}
# At this point the directory referenced to by file_stem should contain K stan_fit files,
# each containing n_chains chains. We don't have those loaded into memory.
return()
}
######################################################################################
#### Data simulation
## Gen2DMC2PL_model53.R
## Generate 2-dimensional (multidimensional) compensatory 2PL model parameters and data
library(MASS)
remove(list=ls())
set.seed(12345) # model 16
N <- 1350
K <- 38 # 100 # 38 # 125 # 200 # 500 (broke R) # 38
ndim <- 2
# D <- 1.0
mu_theta <- c(0.00,0.00)
sigma_theta <- matrix(data=c(1.00, 0.00, 0.00, 1.00), nrow=2, ncol=2)
mu_alpha <- 0.0 # log(1.0)
sigma_alpha <- 0.5
mu_beta <- 1.0
sigma_beta <- 0.5
outfile <- "~/Documents/IRT_Research/Data/IRT_simulated_data/for_R/mod53_data.csv"
params_directory <- "~/Documents/IRT_Research/Data/IRT_simulated_data/Model53/"
# Simulate model parameters
thetas <- mvrnorm(N, mu=mu_theta, Sigma = sigma_theta)
alphas <- matrix(data=rlnorm(K*ndim, meanlog = mu_alpha, sdlog = sigma_alpha),nrow=K,ncol=ndim)
betas <- rnorm(K, mean=mu_beta, sd=sigma_beta)
# Compute probability that respondent i answers question j correctly
pY <- matrix(data=NA,nrow=N,ncol=K)
for(i in 1:nrow(pY))
{
for(j in 1:ncol(pY))
{
num <- exp(alphas[j,]%*%thetas[i,]+betas[j])
pY[i,j] <- num/(1+num)
}
}
# Simulate observed responses
Y <- matrix(data=NA,nrow=N,ncol=K)
for(i in 1:nrow(pY))
{
for(j in 1:ncol(pY))
{
if(runif(1,min=0,max=1)<pY[i,j])
{
Y[i,j] = 1.0
}
else
{
Y[i,j] = 0.0
}
}
}
length(which(is.na(Y)))
# Label the questions
col.labels <- rep(NA,ncol(Y))
for(j in 1:ncol(Y))
{
col.labels[j] <- paste("Q",j,sep="")
}
colnames(Y) <- col.labels
# Check results: responses and probabilities should be correlated at about 0.5
cor(as.vector(Y),as.vector(pY))
# Output results
write.csv(Y, file=outfile,quote=FALSE)
write.csv(thetas,
file=paste(params_directory,"thetas.csv",sep=""))
write.csv(alphas,
file=paste(params_directory,"alphas.csv",sep=""))
write.csv(betas,
file=paste(params_directory,"betas.csv",sep=""))
write.csv(pY,
file=paste(params_directory,"pY.csv",sep=""))
##################################################################################
###############################################3
# Actually running the model
## Import libraries
source('~/Documents/IRT_Research/R_work/Stan_IRT_utilities.R')
options(mc.cores = parallel::detectCores())
library(rstan)
rstan_options(auto_write = TRUE)
library(loo)
library(bayesplot)
library(caret)
source('~/Documents/IRT_Research/R_work/k_fold_functions.R') # relevant functions are above
source('~/Documents/IRT_Research/R_work/loo_compare.R')
# This is from the R package loo, minor changes made to allow simple comparisons of two
# models fit with K-fold cross-validation
## Fit a K-fold 2-D model to the 2-D dataset
data <- read.table("~/Documents/IRT_Research/Data/IRT_simulated_data/for_R/mod53_data.csv",header=TRUE,sep=",",row.names=1)
I = dim(data)[1]
J = dim(data)[2]
data_irt <- list(n_respondents=I,n_items=J,D=2,Y=data)
kfold18 <- do_K_fold_cross_validation_memory_friendly(data=data_irt,
stan_file="~/Documents/IRT_Research/Stan/Simulated_data/mod40/stan_simulated_data_model_40_kfold_cv.stan",
save_file_stem <- "~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/saved_models/2D_model/",
stratification_factor=NULL,
n_folds=10,
n_chains=3,
n_cores=4,
merge_chains=FALSE,
iter=1000,
control=list(max_treedepth = 15))
save(kfold18, file="~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/2d_k_fold_elpd.dta")
load(file="~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/2d_k_fold_elpd.dta")
loo18 <- get_loo_from_kfold(kfold18)
save(loo18, file="~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/2d_k_fold_loo.dta")
load(file="~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/2d_k_fold_loo.dta")
library(loo)
plot(loo18)
# This last model took nine days to run, ouch.
## Fit a K-fold 3-D model to the 2-D dataset
data <- read.table("~/Documents/IRT_Research/Data/IRT_simulated_data/for_R/mod53_data.csv",header=TRUE,sep=",",row.names=1)
I = dim(data)[1]
J = dim(data)[2]
data_irt <- list(n_respondents=I,n_items=J,D=3,Y=data)
kfold19 <- do_K_fold_cross_validation_memory_friendly(data=data_irt,
stan_file="~/Documents/IRT_Research/Stan/Simulated_data/mod40/stan_simulated_data_model_40_kfold_cv.stan",
save_file_stem <- "~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/saved_models/3D_model/",
stratification_factor=NULL,
n_folds=10,
n_chains=3,
n_cores=3,
merge_chains=FALSE,
iter=1000,
control=list(max_treedepth = 15))
save(kfold19, file="~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/3d_k_fold_elpd.dta")
load(file="~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/3d_k_fold_elpd.dta")
loo19 <- get_loo_from_kfold(kfold19)
#Warning message:
# In log(z) : NaNs produced
save(loo19, file="~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/3d_k_fold_loo.dta")
load(file="~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/3d_k_fold_loo.dta")
load(file="~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/2d_k_fold_loo.dta")
compare(loo18, loo19)
plot(loo18)
plot(loo19)
load(file="~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/3d_k_fold_elpd.dta")
load(file="~/Documents/IRT_Research/Data/IRT_simulated_data/cv_test/Model5/2d_k_fold_elpd.dta")
source('~/Documents/IRT_Research/R_work/loo_compare.R')
kfold18_struct <- structure(kfold18$kk, class="loo")
kfold19_struct <- structure(kfold19$kk, class="loo")
compare(loo18, loo19)
```