Hello,
I’m working on a project with Bayesian stacking of stan_glm() linear models in rstanarm. To compare the performance of the stack to the individual member models (stored in fits_list), I want to calculate the elpd_loo of the stack, but I am unsure how to do so correctly. Here’s a code snippet of how I calculate the stacked predictions for the outcome “y”:
# Compute posterior predictive density
ypredStack <- matrix(NA, nrow = n_draws, ncol = n_obs)
loglik_mat <- matrix(NA, nrow = n_draws, ncol = n_obs)
for (d in 1:n_draws) {
k <- sample(1:length(wtsStacking, size = 1, prob = wtsStacking)
ypredStack[d, ] <- posterior_predict(fits_list[[k]], draws = 1)
loglik_mat[d,] <- t(as.matrix(log_lik(fits_list[[k]])[d,])) #is this correct?
}
stacked_loo <- loo(loglik_mat)
The above code takes 1 draw from the log-likelihood matrix for the chosen model for each iteration over all draws. However, the loo results from this matrix shows the stack performs notably worse than 6/7 of the individual linear models in the ensemble - which is confusing to me.
My question is - is this method a valid way to extract a log likelihood matrix for my ensemble? If not, is there a better method that allows me to make elpd_loo (or some other metric) comparisons directly?
Thank you for any and all guidance!