Quick question about update in rstanarm

Hi all,

Following the vignette in rstanarm fo computing loo, I am using the following code, which works just fine

fits <- list()
fits[[1]] <- stan_glm(PV1READ ~ 1, data = PISAdata, seed = 832762) # This is the baseline model 1
fits[[2]] <- update(fits[[1]], formula = PV1READ ~ Female + ESCS + HOMEPOS + ICTRES) # model 1 variables
fits[[3]] <- update(fits[[1]], formula = PV1READ ~ JOYREAD + PISADIFF + SCREADDIFF + SCREADCOMP) # model 2 variables
fits[[4]] <- update(fits[[1]], formula = PV1READ ~ METASUM + GFOFAIL + MASTGOAL + SWBP + WORKMAST + ADAPTIVITY + COMPETE) # model 3 variables
fits[[5]] <- update(fits[[1]], formula = PV1READ ~ PERFEED + TEACHINT + BELONG) # model 3 variables

My question is whether it is necessary to compute fits[[1]] the baseline model, or was this model in the vignette considered a reasonable model for the applied Milk example. Once the weights are obtained, are the weights attached to model 1 (the baseline model) are they interpretable. For this example, by the way, the weights are 0.000.

Thanks

David

}


To include mathematical notation in your post put LaTeX syntax between two $ symbols, e.g.,
p(\theta | y) \propto p(\theta) p(y | \theta).

Don’t forget to add relevant tags to your topic (top right of this form) for application area and/or class of models you work with.

Actually, the vignette is here.

What weights? Did you call loo_model_weights?

Yes. Sorry, here is the entire code (as a markdown)


title: “Bayesian Stacking”
date: “Oct. 11, 2020”

output: html_document

knitr::opts_chunk$set(echo = TRUE)
library(rstanarm)
library(loo)
library(LaplacesDemon)
library(BMS)
PISAdata <- read.csv("pisa.2018.usa.imp.read.csv", header=T) 
PISAdata <- PISAdata[1:4000,]
PISAdata$Female <- ifelse(PISAdata$Female=="Female",1,0)

Four models of interest to stack.

Model 1: Baseline model

Model 2: FEMALE, ESCS, HOMEPOS, ICTRES

Model 3: JOYREAD, PISADIFF, SCREADCOMP, SCREADDIF

Model 4: METASUM, GFOFAIL, MASTGOAL, SWBP, WORKMAST, ADAPTIVITY, COMPETE

Model 5: PERFEED, ADAPTIVE, TEACHINT, BELONG

attach(PISAdata)
fits <- list()
fits[[1]] <- stan_glm(PV1READ ~ 1, data = PISAdata, seed = 832762) # This is the baseline model 1
fits[[2]] <- update(fits[[1]], formula = PV1READ ~ Female + ESCS + HOMEPOS + ICTRES) # model 1 variables
fits[[3]] <- update(fits[[1]], formula = PV1READ ~ JOYREAD + PISADIFF + SCREADDIFF + SCREADCOMP) # model 2 variables
fits[[4]] <- update(fits[[1]], formula = PV1READ ~ METASUM + GFOFAIL + MASTGOAL + SWBP + WORKMAST + ADAPTIVITY + COMPETE) # model 3 variables
fits[[5]] <- update(fits[[1]], formula = PV1READ ~ PERFEED + TEACHINT + BELONG) # model 3 variables

Compute loo

loo_list <- lapply(fits, loo, cores = 4)

Compute stacking weights. The first model is the intercept

wtsStacking <- loo_model_weights(loo_list, method = "stacking")
print(wtsStacking)

Notice that the weight is mostly given to model 3.

Average predictions (sample with probability equal to weights)

n_draws <- nrow(as.matrix(fits[[1]])) 
ypredStacking <- matrix(NA, nrow = n_draws, ncol = nobs(fits[[1]]))
for (d in 1:n_draws) {
  k <- sample(1:length(wtsStacking), size = 1, prob = wtsStacking)
  ypredStacking[d, ] <- posterior_predict(fits[[k]], draws = 1)
}

This provides the distribution of predicted values and compares them to the actual values using the KLD.

ypredStacking <- colMeans(ypredStacking)
#kld1 <- KLD(ypredStacking,PV1READ)$sum.KLD.py.px
KLD(ypredStacking,PV1READ)$sum.KLD.py.px

If you are using loo_model_weights, then you should calculate all models under consideration even though fits[[1]] seems implausible both before and after conditioning on the data. I don’t think those weights are particularly interpretable, unless they are close to zero or one. In general, they are merely the weights that maximize the ELPD.

Thanks Ben. I just want to be clear as to what the update command is doing. It seems from the Milk example, that this baseline intercept only model is needed so the update command has something to update. But the output yields a weight for the baseline model (albeit very close to zero). So, I now just ran separate stan models

fits <- list()
fits[[1]] <- stan_glm(PV1READ ~ Female + ESCS + HOMEPOS + ICTRES) # model 1 variables
fits[[2]] <- stan_glm(PV1READ ~ JOYREAD + PISADIFF + SCREADDIFF + SCREADCOMP) # model 2 variables
fits[[3]] <- stan_glm(PV1READ ~ METASUM + GFOFAIL + MASTGOAL + SWBP + WORKMAST + ADAPTIVITY + COMPETE) # model 3 variables
fits[[4]] <- stan_glm(PV1READ ~ PERFEED + TEACHINT + BELONG) # model 3 variables

This gives me the following weights without the need for the baseline model.
Method: stacking

   weight

model1 0.014
model2 0.581
model3 0.405
model4 0.000

This seems fine, so I’m presuming that the baseline model in the Milk example was a model of interest. Would you agree.

Again, thanks so much.

David

Yes. The update thing just re-uses the same data (and priors and other default arguments).