Using Kullback-Liebler divergence to summarize stacking results with LOO

dkaplan · October 5, 2020, 9:10pm

Hi all, I have a question regarding how to summarize the stacking predictions with the actual values using KL-divergence. The code is below, and everything works fine. As you can see, I’m looking at 5 reasonable models of reading performance. I obtain the stacking, pseudoBMA and pseudoBMABB weights for each model. If we look at ypredStacking for example, we have a matrix which is, say d x n, where the rows are the number of draws and the columns are the students. So for each student I have d draws from their posterior predictive distribution. If I want to compare the predictive distribution of reading scores for the sample to the actual reading scores, I am taking the column means of ypredStacking, which gives me an n x 1 vector of predicted scores and then running KLD on the n x 1 actual values. As I mentioned, this all works, but I would appreciate a sanity check on my thinking.

Thanks in advance,

David

library(rstanarm)
library(loo)
library(LaplacesDemon)

PISAdata <- read.csv(file.choose(), header=T)
PISAdata <- PISAdata[1:4000,]
PISAdata$Female <- ifelse(PISAdata$Female==“Female”,1,0)

Here is where you would write out the separate models. There would be 6 models in total.

Notice that the first model is just the intercept, and then each model updates the first.

Model 1: FEMALE, ESCS, HOMEPOS, ICTRES

Model 2: JOYREAD, PISADIFF, SCREADCOMP, SCREADDIF

Model 3: METASUM, GFOFAIL, MASTGOAL, SWBP, WORKMAST, ADAPTIVITY, COMPETE

Model 4: PERFEED, ADAPTIVE, TEACHINT, BELONG

Model 5: Full model with all predictors

attach(d)
fits <- list()
fits[[1]] <- stan_glm(PV1READ ~ 1, data = d, seed = 832762) # This is the baseline model
fits[[2]] <- update(fits[[1]], formula = PV1READ ~ ) # model 1 variables
fits[[3]] <- update(fits[[1]], formula = PV1READ ~ ) # model 2 variables
fits[[4]] <- update(fits[[1]], formula = PV1READ ~ ) # model 4 variables
fits[[5]] <- update(fits[[1]], formula = PV1READ ~ ) # model 5 variables
fits[[6]] <- update(fits[[1]], formula = PV1READ ~ ) # full model

Compute loo

loo_list <- lapply(fits, loo, cores = 4)

Compute stacking weights

wtsStacking <- loo_model_weights(loo_list, method = “stacking”)
print(wtsStacking)

wtsPseudoBMA <- loo_model_weights(loo_list, method = “pseudobma”, BB=FALSE)
print(wtsPseudoBMA)

wtsPseudoBMABB <- loo_model_weights(loo_list, method = “pseudobma”)
print(wtsPseudoBMABB)

Average predictions (sample with probability equal to weights)

n_draws <- nrow(as.matrix(fits[[1]]))
ypredStacking <- matrix(NA, nrow = n_draws, ncol = nobs(fits[[1]]))
for (d in 1:n_draws) {
k <- sample(1:length(wts), size = 1, prob = wtsStacking)
ypredStacking[d, ] <- posterior_predict(fits[[k]], draws = 1)
}

n_draws <- nrow(as.matrix(fits[[1]]))
ypredPseudoBMA<- matrix(NA, nrow = n_draws, ncol = nobs(fits[[1]]))
for (d in 1:n_draws) {
k <- sample(1:length(wts), size = 1, prob = wtsPseudoBMA)
ypredPseudoBMA[d, ] <- posterior_predict(fits[[k]], draws = 1)
}

n_draws <- nrow(as.matrix(fits[[1]]))
ypredPseudoBMABB<- matrix(NA, nrow = n_draws, ncol = nobs(fits[[1]]))
for (d in 1:n_draws) {
k <- sample(1:length(wts), size = 1, prob = wtsPseudoBMABB)
ypredPseudoBMABB[d, ] <- posterior_predict(fits[[k]], draws = 1)
}

This provides the distribution of predicted values and compares them to the

actual values using the KLD. Please make a note of all of the values.

ypredStacking <- colMeans(ypredStacking)
KLD(ypredStacking,PV1READ)

ypredPseudoBMA <- colMeans(ypredPseudoBMA)
KLD(ypredPseudoBMA,PV1READ)

ypredPseudoBMABB <- colMeans(ypredPseudoBMABB)
KLD(ypredPseudoBMABB,PV1READ)

dkaplan · October 9, 2020, 6:01pm

I would like to add a question to this post. I have also calculated Bayesian model averaging via Zeugner and Feldkircher’s “BMS” program. I all the variables that I have used for the loo analysis in my previous post. As in the previous post where I am obtaining the KLD values, I can also get KLD for with the BMS program. I am presuming they are comparable (and in fact they are close), but again, I’m checking my reasoning with the group.

Thanks

David

avehtari · October 9, 2020, 6:27pm

I don’t understand what is happening here?

You can also simplify by using a deterministic resampling approach which has negligible small bias if there are only six models and thousands of posterior draws. E.g. if you want to get 1000 combined draws then from each model you get 1000*weight draws (rounded so that the total is 1000).

Assuming KLD is a function that can estimate KL-divergence using two samples from two distributions, then this seems ok.

If you use the same KLD function and the same approach of obtaining the draws from the model averaged predictive distribution then yes.

dkaplan · October 9, 2020, 7:13pm

Thanks Aki. The code was just putting in some place holders and my student filled in the blanks. Everything works very well.

David

Topic		Replies	Views
Question on model stacking -- averaging posteriors of parameters, not predictions Modeling loo , model-comparison	19	1786	July 24, 2025
Calculating model averaged estimates using loo Modeling loo	8	1307	January 31, 2024
Stacked elpd_loo estimation Modeling performance , loo , rstanarm , cross-validation	4	221	April 26, 2024
Loo_compare vs averaging/weighting via stacking or pseudo-BMA weighting rstanarm fitting-issues , loo	2	926	February 5, 2021
Generating LOO-CV predictions from stacked predictive distributions brms loo , brms	0	593	April 5, 2022

Using Kullback-Liebler divergence to summarize stacking results with LOO

Here is where you would write out the separate models. There would be 6 models in total.

Notice that the first model is just the intercept, and then each model updates the first.

Model 1: FEMALE, ESCS, HOMEPOS, ICTRES

Model 2: JOYREAD, PISADIFF, SCREADCOMP, SCREADDIF

Model 3: METASUM, GFOFAIL, MASTGOAL, SWBP, WORKMAST, ADAPTIVITY, COMPETE

Model 4: PERFEED, ADAPTIVE, TEACHINT, BELONG

Model 5: Full model with all predictors

Compute loo

Compute stacking weights

Average predictions (sample with probability equal to weights)

This provides the distribution of predicted values and compares them to the

actual values using the KLD. Please make a note of all of the values.

Related topics