Hello,
I’d like to double-check my model comparison workflow. Ideally, we would use the lovely LOO/PSIS methods, but we haven’t had much luck with this project - our model is quite complicated, at least by my standards, and we end up with a lot of very slow computation and high pareto k statistics.
Instead, we are trying to implement a good old-fashioned train/test split.
Does this workflow look correct:
train_test_simple_FoMo <- function(df_train, ds_train, df_test, ds_test,
FoMo_ver = "1.1",
features = c("spatial", "item_class"),
n_trials_to_sim = 1,
iter = 500) {
d_train_list <- prep_data_for_stan(df_train, ds_train, features)
d_train_list <- add_priors_to_d_list(d_train_list, modelver = FoMo_ver)
d_train_list$n_trials_to_sim <- n_trials_to_sim
mod <- cmdstan_model(paste0("../../models/simple/FoMo", str_replace("1.1", "\\.", "_" ), ".stan"))
# train model
m_train <- mod$sample(data = d_train_list,
chains = 4, parallel_chains = 4, threads = 4,
refresh = 0,
iter_warmup = iter, iter_sampling = iter,
sig_figs = 3)
loglik_train <- m_train$draws("log_lik", format = "matrix")
# test model
d_test_list <- prep_data_for_stan(df_test, ds_test, model_components = features)
d_test_list <- add_priors_to_d_list(d_test_list, modelver = FoMo_ver)
d_test_list$n_trials_to_sim <- n_trials_to_sim
gen_test <- mod$generate_quantities(m_train, data = d_test_list, seed = 123)
loglik_test <- gen_test$draws("log_lik", format = "matrix")
elpd_test <- elpd(loglik_test)
return(list(model = m_train,
loglik = list(train = loglik_train, test = loglik_test),
elpd_test = elpd_test))
}
# model 1.0
model10 <- train_test_simple_FoMo(df_train, ds_train, df_test, ds_test,
FoMo_ver = "1.0",
features = c("spatial", "item_class"),
n_trials_to_sim = 1,
iter = 500)
## model 1.1
model11 <- train_test_simple_FoMo(df_train, ds_train, df_test, ds_test,
FoMo_ver = "1.1",
features = c("spatial", "item_class"),
n_trials_to_sim = 1,
iter = 500)
## model 1.2
model12 <- train_test_simple_FoMo(df_train, ds_train, df_test, ds_test,
FoMo_ver = "1.2",
features = c("spatial", "item_class"),
n_trials_to_sim = 1,
iter = 500)
I can then run
loo_compare(model10$elpd_test, model11$elpd_test, model12$elpd_test)
elpd_diff se_diff
model1 0.0 0.0
model3 0.0 0.1
model2 -0.2 0.1
Which seems fine? However, I’d quite like to compute model weights (assuming this is a sensible thing do to?) as I feel that these are easier for non-expert reviewers to understand?
If I do this, I get:
> loo_model_weights(list(model10$loglik$test, model11$loglik$test, model12$loglik$test))
Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
Method: stacking
------
weight
model1 0.035
model2 0.000
model3 0.965
Two questions, which might help resolve some my my faulty thinking :)
-
Am I right to be running loo_model_weights() and loo_compare()? My understanding was that loo was short for “leave on out”, and the loo methods that are generally used are doing some clever maths to estimate these values. Whereas I am trying to test my model more directly?
-
Can I ignore the pareto k diagnostic values in this case? What do they mean in this case?
Thanks hugely
Alasdair