Quick question about update in rstanarm

dkaplan · October 11, 2020, 5:41pm

Hi all,

Following the vignette in rstanarm fo computing loo, I am using the following code, which works just fine

fits <- list()
fits[[1]] <- stan_glm(PV1READ ~ 1, data = PISAdata, seed = 832762) # This is the baseline model 1
fits[[2]] <- update(fits[[1]], formula = PV1READ ~ Female + ESCS + HOMEPOS + ICTRES) # model 1 variables
fits[[3]] <- update(fits[[1]], formula = PV1READ ~ JOYREAD + PISADIFF + SCREADDIFF + SCREADCOMP) # model 2 variables
fits[[4]] <- update(fits[[1]], formula = PV1READ ~ METASUM + GFOFAIL + MASTGOAL + SWBP + WORKMAST + ADAPTIVITY + COMPETE) # model 3 variables
fits[[5]] <- update(fits[[1]], formula = PV1READ ~ PERFEED + TEACHINT + BELONG) # model 3 variables

My question is whether it is necessary to compute fits[[1]] the baseline model, or was this model in the vignette considered a reasonable model for the applied Milk example. Once the weights are obtained, are the weights attached to model 1 (the baseline model) are they interpretable. For this example, by the way, the weights are 0.000.

Thanks

David

}

To include mathematical notation in your post put LaTeX syntax between two $ symbols, e.g.,
p(\theta | y) \propto p(\theta) p(y | \theta).

Don’t forget to add relevant tags to your topic (top right of this form) for application area and/or class of models you work with.

dkaplan · October 11, 2020, 6:33pm

Actually, the vignette is here.

bgoodri · October 11, 2020, 7:04pm

What weights? Did you call loo_model_weights?

dkaplan · October 11, 2020, 7:14pm

Yes. Sorry, here is the entire code (as a markdown)

title: “Bayesian Stacking”
date: “Oct. 11, 2020”

output: html_document

knitr::opts_chunk$set(echo = TRUE)

library(rstanarm)
library(loo)
library(LaplacesDemon)
library(BMS)

PISAdata <- read.csv("pisa.2018.usa.imp.read.csv", header=T) 
PISAdata <- PISAdata[1:4000,]
PISAdata$Female <- ifelse(PISAdata$Female=="Female",1,0)

Four models of interest to stack.

Model 1: Baseline model

Model 2: FEMALE, ESCS, HOMEPOS, ICTRES

Model 3: JOYREAD, PISADIFF, SCREADCOMP, SCREADDIF

Model 4: METASUM, GFOFAIL, MASTGOAL, SWBP, WORKMAST, ADAPTIVITY, COMPETE

Model 5: PERFEED, ADAPTIVE, TEACHINT, BELONG

attach(PISAdata)
fits <- list()
fits[[1]] <- stan_glm(PV1READ ~ 1, data = PISAdata, seed = 832762) # This is the baseline model 1
fits[[2]] <- update(fits[[1]], formula = PV1READ ~ Female + ESCS + HOMEPOS + ICTRES) # model 1 variables
fits[[3]] <- update(fits[[1]], formula = PV1READ ~ JOYREAD + PISADIFF + SCREADDIFF + SCREADCOMP) # model 2 variables
fits[[4]] <- update(fits[[1]], formula = PV1READ ~ METASUM + GFOFAIL + MASTGOAL + SWBP + WORKMAST + ADAPTIVITY + COMPETE) # model 3 variables
fits[[5]] <- update(fits[[1]], formula = PV1READ ~ PERFEED + TEACHINT + BELONG) # model 3 variables

Compute loo

loo_list <- lapply(fits, loo, cores = 4)

Compute stacking weights. The first model is the intercept

wtsStacking <- loo_model_weights(loo_list, method = "stacking")
print(wtsStacking)

Notice that the weight is mostly given to model 3.

Average predictions (sample with probability equal to weights)

n_draws <- nrow(as.matrix(fits[[1]])) 
ypredStacking <- matrix(NA, nrow = n_draws, ncol = nobs(fits[[1]]))
for (d in 1:n_draws) {
  k <- sample(1:length(wtsStacking), size = 1, prob = wtsStacking)
  ypredStacking[d, ] <- posterior_predict(fits[[k]], draws = 1)
}

This provides the distribution of predicted values and compares them to the actual values using the KLD.

ypredStacking <- colMeans(ypredStacking)
#kld1 <- KLD(ypredStacking,PV1READ)$sum.KLD.py.px

KLD(ypredStacking,PV1READ)$sum.KLD.py.px

bgoodri · October 11, 2020, 11:55pm

If you are using loo_model_weights, then you should calculate all models under consideration even though fits[[1]] seems implausible both before and after conditioning on the data. I don’t think those weights are particularly interpretable, unless they are close to zero or one. In general, they are merely the weights that maximize the ELPD.

dkaplan · October 12, 2020, 12:33am

Thanks Ben. I just want to be clear as to what the update command is doing. It seems from the Milk example, that this baseline intercept only model is needed so the update command has something to update. But the output yields a weight for the baseline model (albeit very close to zero). So, I now just ran separate stan models

fits <- list()
fits[[1]] <- stan_glm(PV1READ ~ Female + ESCS + HOMEPOS + ICTRES) # model 1 variables
fits[[2]] <- stan_glm(PV1READ ~ JOYREAD + PISADIFF + SCREADDIFF + SCREADCOMP) # model 2 variables
fits[[3]] <- stan_glm(PV1READ ~ METASUM + GFOFAIL + MASTGOAL + SWBP + WORKMAST + ADAPTIVITY + COMPETE) # model 3 variables
fits[[4]] <- stan_glm(PV1READ ~ PERFEED + TEACHINT + BELONG) # model 3 variables

This gives me the following weights without the need for the baseline model.
Method: stacking

   weight

model1 0.014
model2 0.581
model3 0.405
model4 0.000

This seems fine, so I’m presuming that the baseline model in the Milk example was a model of interest. Would you agree.

Again, thanks so much.

David

bgoodri · October 12, 2020, 12:36am

Yes. The update thing just re-uses the same data (and priors and other default arguments).

Topic		Replies	Views
Lfo-cv update() function for stan_glmer()? rstanarm loo	6	567	February 4, 2019
Calculating model averaged estimates using loo Modeling loo	8	1241	January 31, 2024
How to decide on the best model? Modeling loo	3	599	December 3, 2020
Update() in brms and rstanarm Modeling techniques , specification	1	608	January 26, 2021
Loo_model_weights() returning inconsistent result General rstan , loo , r , brms	4	573	May 20, 2021