Stacked elpd_loo estimation

Kjorte_Harra · April 25, 2024, 3:43pm

Hello,

I’m working on a project with Bayesian stacking of stan_glm() linear models in rstanarm. To compare the performance of the stack to the individual member models (stored in fits_list), I want to calculate the elpd_loo of the stack, but I am unsure how to do so correctly. Here’s a code snippet of how I calculate the stacked predictions for the outcome “y”:

# Compute posterior predictive density 

ypredStack <- matrix(NA, nrow = n_draws, ncol = n_obs)
loglik_mat <- matrix(NA, nrow = n_draws, ncol = n_obs)

for (d in 1:n_draws) {
  k <- sample(1:length(wtsStacking, size = 1, prob = wtsStacking)
  ypredStack[d, ] <- posterior_predict(fits_list[[k]], draws = 1)
  loglik_mat[d,] <- t(as.matrix(log_lik(fits_list[[k]])[d,])) #is this correct?
}
stacked_loo <- loo(loglik_mat)

The above code takes 1 draw from the log-likelihood matrix for the chosen model for each iteration over all draws. However, the loo results from this matrix shows the stack performs notably worse than 6/7 of the individual linear models in the ensemble - which is confusing to me.

My question is - is this method a valid way to extract a log likelihood matrix for my ensemble? If not, is there a better method that allows me to make elpd_loo (or some other metric) comparisons directly?

Thank you for any and all guidance!

Bob_Carpenter · April 25, 2024, 7:42pm

With Bayesian model averaging, where each model gets a weight in a simplex, you can just weight the likelihoods to create a mixture and carry that over into the log domain to get a log likelihood. If you restrict stacking weights to a simplex, you can define a mixture in the same way.

In practice, this means for each of the data items in your data set, you need to calculate a weighted average of the likelihoods on the log scale. With N data items and K models, it’s

log_lik[n] = log(SUM_k w[k] * likelihood[n, k])
           = LOG_SUM_EXP_k log(w[k]) + log(likelihood[n, k])

where w[k] is the weight of the k-th model with w forming a simplex (non-negative values that sum to 1), and likelihood[n, k] is the likelihood of the n-th data item in the k-th model.

avehtari · April 26, 2024, 12:46pm

@Bob_Carpenter did give the correct answer, but further complication is that the weights have been optimized to maximize elpd, and thus the resulting weighted elpd is overoptimistic. To get a fair comparison against the individual model, the stacking procedure needs to be cross-validated, too (e.g. using K-fold-CV).

Bob_Carpenter · April 26, 2024, 7:49pm

Cool. Thanks for the clarification.

Kjorte_Harra · April 26, 2024, 8:11pm

Thank you both for the feedback and clarifications. I’ll keep this in mind going forward!

Topic		Replies	Views
Calculating model averaged estimates using loo Modeling loo	8	1241	January 31, 2024
Loo_compare vs averaging/weighting via stacking or pseudo-BMA weighting rstanarm fitting-issues , loo	2	889	February 5, 2021
Using Kullback-Liebler divergence to summarize stacking results with LOO Modeling loo	3	705	October 9, 2020
Question on model stacking -- averaging posteriors of parameters, not predictions Modeling loo , model-comparison	13	1553	March 16, 2021
Estimated p_loo > p Modeling loo	5	711	May 2, 2019

Stacked elpd_loo estimation

Related topics