Using Kullback-Liebler divergence to summarize stacking results with LOO

Hi all, I have a question regarding how to summarize the stacking predictions with the actual values using KL-divergence. The code is below, and everything works fine. As you can see, I’m looking at 5 reasonable models of reading performance. I obtain the stacking, pseudoBMA and pseudoBMABB weights for each model. If we look at ypredStacking for example, we have a matrix which is, say d x n, where the rows are the number of draws and the columns are the students. So for each student I have d draws from their posterior predictive distribution. If I want to compare the predictive distribution of reading scores for the sample to the actual reading scores, I am taking the column means of ypredStacking, which gives me an n x 1 vector of predicted scores and then running KLD on the n x 1 actual values. As I mentioned, this all works, but I would appreciate a sanity check on my thinking.

Thanks in advance,

David

library(rstanarm)
library(loo)
library(LaplacesDemon)

PISAdata <- read.csv(file.choose(), header=T)
PISAdata <- PISAdata[1:4000,]
PISAdata$Female <- ifelse(PISAdata$Female==“Female”,1,0)

Here is where you would write out the separate models. There would be 6 models in total.

Notice that the first model is just the intercept, and then each model updates the first.

Model 1: FEMALE, ESCS, HOMEPOS, ICTRES

Model 2: JOYREAD, PISADIFF, SCREADCOMP, SCREADDIF

Model 3: METASUM, GFOFAIL, MASTGOAL, SWBP, WORKMAST, ADAPTIVITY, COMPETE

Model 4: PERFEED, ADAPTIVE, TEACHINT, BELONG

Model 5: Full model with all predictors

attach(d)
fits <- list()
fits[[1]] <- stan_glm(PV1READ ~ 1, data = d, seed = 832762) # This is the baseline model
fits[[2]] <- update(fits[[1]], formula = PV1READ ~ ) # model 1 variables
fits[[3]] <- update(fits[[1]], formula = PV1READ ~ ) # model 2 variables
fits[[4]] <- update(fits[[1]], formula = PV1READ ~ ) # model 4 variables
fits[[5]] <- update(fits[[1]], formula = PV1READ ~ ) # model 5 variables
fits[[6]] <- update(fits[[1]], formula = PV1READ ~ ) # full model

Compute loo

loo_list <- lapply(fits, loo, cores = 4)

Compute stacking weights

wtsStacking <- loo_model_weights(loo_list, method = “stacking”)
print(wtsStacking)

wtsPseudoBMA <- loo_model_weights(loo_list, method = “pseudobma”, BB=FALSE)
print(wtsPseudoBMA)

wtsPseudoBMABB <- loo_model_weights(loo_list, method = “pseudobma”)
print(wtsPseudoBMABB)

Average predictions (sample with probability equal to weights)

n_draws <- nrow(as.matrix(fits[[1]]))
ypredStacking <- matrix(NA, nrow = n_draws, ncol = nobs(fits[[1]]))
for (d in 1:n_draws) {
k <- sample(1:length(wts), size = 1, prob = wtsStacking)
ypredStacking[d, ] <- posterior_predict(fits[[k]], draws = 1)
}

n_draws <- nrow(as.matrix(fits[[1]]))
ypredPseudoBMA<- matrix(NA, nrow = n_draws, ncol = nobs(fits[[1]]))
for (d in 1:n_draws) {
k <- sample(1:length(wts), size = 1, prob = wtsPseudoBMA)
ypredPseudoBMA[d, ] <- posterior_predict(fits[[k]], draws = 1)
}

n_draws <- nrow(as.matrix(fits[[1]]))
ypredPseudoBMABB<- matrix(NA, nrow = n_draws, ncol = nobs(fits[[1]]))
for (d in 1:n_draws) {
k <- sample(1:length(wts), size = 1, prob = wtsPseudoBMABB)
ypredPseudoBMABB[d, ] <- posterior_predict(fits[[k]], draws = 1)
}

This provides the distribution of predicted values and compares them to the

actual values using the KLD. Please make a note of all of the values.

ypredStacking <- colMeans(ypredStacking)
KLD(ypredStacking,PV1READ)

ypredPseudoBMA <- colMeans(ypredPseudoBMA)
KLD(ypredPseudoBMA,PV1READ)

ypredPseudoBMABB <- colMeans(ypredPseudoBMABB)
KLD(ypredPseudoBMABB,PV1READ)

I would like to add a question to this post. I have also calculated Bayesian model averaging via Zeugner and Feldkircher’s “BMS” program. I all the variables that I have used for the loo analysis in my previous post. As in the previous post where I am obtaining the KLD values, I can also get KLD for with the BMS program. I am presuming they are comparable (and in fact they are close), but again, I’m checking my reasoning with the group.

Thanks

David

I don’t understand what is happening here?

You can also simplify by using a deterministic resampling approach which has negligible small bias if there are only six models and thousands of posterior draws. E.g. if you want to get 1000 combined draws then from each model you get 1000*weight draws (rounded so that the total is 1000).

Assuming KLD is a function that can estimate KL-divergence using two samples from two distributions, then this seems ok.

If you use the same KLD function and the same approach of obtaining the draws from the model averaged predictive distribution then yes.

1 Like

Thanks Aki. The code was just putting in some place holders and my student filled in the blanks. Everything works very well.

David

1 Like