I ran 8 models, 4 with only fixed effects with stan_glm and 4 mixed models with stan_lmer. And I’m trying to compare them to decide with which one of the 8 I should proceed.
All the models are converged very well, so it seems from the several diagnostic checks (ESS, ACF, trace plots, ess ratio), from the print(loo) output have khat < 0.5 and Monte Carlo SE of elpd_loo is 0.0 in all models.
The dataset contains N = 503 observations (students).
stan_glm(ec ~ g, data = p_1_36ec_trj, prior = normal(), chains = 4, iter = 10000, warmup = 1000)
And a model with stan_lmer with a group specific intercept and slope:
stan_lmer(ec ~ g + (g | chr), data = p_1_36ec_trj, iter = 10000, warmup = 1000, adapt_delta = 0.99)
Thus also for the remaining 3 models.
When using loo for al the 8 models, e.g.
p_1_ec_g_loo <- loo(p_1_ec_g, save_psis = TRUE)
and then using loo_compare, I get this (I added a model name column for comparison with the stacking and weighting below
(name) elpd_diff se_diff
model_5 0.0 0.0
model_7 -0.8 2.0
model_8 -3.3 2.1
model_1 -5.2 3.4
model_3 -8.2 3.9
model_4 -8.2 3.9
model_6 -14.7 5.5
model_2 -15.8 5.7
Now, based on the thread here, am I right to assume that the differences between these models are far to small to prefer one over the other? According to @avehtari the se_diff should be taken times 5. Which is not the case here.
Furthermore I used
lpd_point_all <- cbind(
p_1_ec_g_loo$pointwise[,"elpd_loo"],
p_1_ec_vo_loo$pointwise[,"elpd_loo"],
p_1_ec_g_vo_loo$pointwise[,"elpd_loo"],
p_1_ec_g_vo_ia_loo$pointwise[,"elpd_loo"],
p_1_ec_g_mm_chr_loo$pointwise[,"elpd_loo"],
p_1_ec_vo_mm_chr_loo$pointwise[,"elpd_loo"],
p_1_ec_g_vo_mm_chr_loo$pointwise[,"elpd_loo"],
p_1_ec_g_vo_ia_mm_chr_loo$pointwise[,"elpd_loo"])
followed by
pbma_wts_all <- pseudobma_weights(lpd_point_all, BB=FALSE)
pbma_BB_wts_all <- pseudobma_weights(lpd_point_all)
stacking_wts_all <- stacking_weights(lpd_point_all)
round(cbind(pbma_wts_all, pbma_BB_wts_all, stacking_wts_all), 3)
which results in:
(rank) (name) pbma_wts_all pbma_BB_wts_all stacking_wts_all
model1 (model_5) 0.004 0.055 0.145
model2 (model_7) 0.000 0.002 0.000
model3 (model_8) 0.001 0.028 0.001
model4 (model_1) 0.000 0.006 0.000
model5 (model_3) 0.666 0.533 0.498
model6 (model_4) 0.000 0.001 0.000
model7 (model_6) 0.304 0.344 0.356
model8 (model_2) 0.025 0.032 0.000
Which of the two is ‘more precise’ the weighing or the comparison of the elpd_diff and se_diff?
And, do the models differ enough to objectively decide for only one of them?