Hi all again,
I’m again with the eight-school examples. Because my target model (not this school one) is not running, I’m trying all sorts of non-centered reparameterization and vectorization as I see fit, somewhat similar to what I found here. Whilst this, I ran across something that I don’t understand—vectorized Stan models which are otherwise the same yield different divergence statistics and parameter estimations.
I compare six different versions of 8-school Stan model. Three of them are centered parameterizations where
- the model is in a loop form,
- the model is vectorized, and
- the model is vectorized and without any generated quantities (e.g. without computing log likelihood).
The remaining three are non-centered counterparts. So the key difference between (1) and (2) is whether they have in the model block
for(j in 1:J){
theta[j] ~ normal(mu, tau);
y[j] ~ normal(theta[j], sigma);
}
or
theta ~ normal(mu, tau);
y ~ normal(theta, sigma);
I initially thought that among the centered models, all three versions should yield exactly the same output, just different in speed of getting there, and similarly for the non-centered. What I think I found instead was that
- looped versions have less divergence,
- looped versions have much tighter estimations of parameters,
- looped versions perform better, obviously (in terms of comparison with the
loo
package), - same vectorized model with and without generated quantities computation have different divergences and E-BFMIs,
- this disparity is less in the non-centered versions.
I don’t understand—should vectorizing or not-vectorizing make any differences in diagnostics/estimation quality? Or having/not having generated quantities block? If this is the case, I’m not sure what to do, because then there is a tradeoff between doing vectorized versus looped versions in terms of speed vs. quality.
I attach the six Stan models that I used, and the R script with which I compiled them. The whole thing takes about 5 minutes. Please let me know if I’m doing anything wrong.
Thank you so much in advance.
eight_schools_cp_model_looped.stan (497 Bytes)
eight_schools_cp_model_vectorized_no_gq.stan (260 Bytes)
eight_schools_ncp_model_vectorized_no_gq.stan (359 Bytes)
eight_schools_cp_model_vectorized.stan (459 Bytes)
eight_schools_ncp_model_looped.stan (596 Bytes)
eight_schools_ncp_model_vectorized.stan (560 Bytes)
library(rstan)
library(loo)
schools_dat <- list(J = 8,
y = c(28, 8, -3, 7, -1, 1, 18, 12),
sigma = c(15, 10, 16, 11, 9, 11, 10, 18))
# Model 1 --------------------------------------------------------
eight_schools_cp_model_vectorized <-
stan(file = "eight_schools_cp_model_vectorized.stan",
data = schools_dat, seed = 194838,
iter = 2000, chains = 4)
check_div(eight_schools_cp_model_vectorized)
check_energy(eight_schools_cp_model_vectorized)
# "85 of 4000 iterations ended with a divergence (2.125%)"
# (No E-BFMI problem)
# 0.263, 0.188, 0.169, 0.148 seconds each chain.
# Adding generated quantities part changes divergence?!!! Why?!
eight_schools_cp_model_vectorized_no_gq <-
stan(file = "eight_schools_cp_model_vectorized_no_gq.stan",
data = schools_dat, seed = 194838,
iter = 2000, chains = 4)
check_div(eight_schools_cp_model_vectorized_no_gq)
check_energy(eight_schools_cp_model_vectorized_no_gq)
# "158 of 4000 iterations ended with a divergence (3.95%)"
# [1] "Chain 4: E-BFMI = 0.152798760202411"
# [1] "E-BFMI below 0.2 indicates you may need to reparameterize your model"
# Model 2 --------------------------------------------------------
eight_schools_cp_model_looped <-
stan(file = "eight_schools_cp_model_looped.stan",
data = schools_dat, seed = 194838,
iter = 2000, chains = 4)
check_div(eight_schools_cp_model_looped)
check_energy(eight_schools_cp_vmodel_looped)
# "0 of 4000 iterations ended with a divergence (0%)"
# (No E-BFMI problem)
# 0.247, 0.159, 0.144, 0.151 seconds each chain
# Model 3 --------------------------------------------------------
eight_schools_ncp_model_vectorized <-
stan(file = "eight_schools_ncp_model_vectorized.stan",
data = schools_dat, seed = 194838,
iter = 2000, chains = 4)
check_div(eight_schools_ncp_model_vectorized)
check_energy(eight_schools_ncp_model_vectorized)
# "1 of 4000 iterations ended with a divergence (0.025%)"
# (No E-BFMI problem)
# 0.192, 0.183, 0.243, 0.103 seconds each chain
# Model 3.5 --------------------------------------------------------
eight_schools_ncp_model_vectorized_no_gq <-
stan(file = "eight_schools_ncp_model_vectorized_no_gq.stan",
data = schools_dat, seed = 194838,
iter = 2000, chains = 4)
check_div(eight_schools_ncp_model_vectorized_no_gq)
check_energy(eight_schools_ncp_model_vectorized_no_gq)
# "3 of 4000 iterations ended with a divergence (0.075%)"
# (No E-BFMI problem)
# 0.128, 0.278, 0.206, 0.121 seconds each chain
# Model 7 --------------------------------------------------------
eight_schools_ncp_model_looped <-
stan(file = "eight_schools_ncp_model_looped.stan",
data = schools_dat, seed = 194838,
iter = 2000, chains = 4)
check_div(eight_schools_ncp_model_looped)
check_energy(eight_schools_ncp_model_looped)
# "0 of 4000 iterations ended with a divergence (0%)"
# (No E-BFMI problem)
# 0.356, 0.211, 0.239, 0.235 seconds
cp_no_gq <- eight_schools_cp_model_vectorized_no_gq
ncp_no_gq <- eight_schools_ncp_model_vectorized_no_gq
rm(list=ls(pattern = "_no_gq"))
for(item in ls(pattern = "eight_schools_")){
print(item)
assign(paste0(substr(item, 15, 10000), "_log_lik"), extract_log_lik(eval(parse(text=item))))
assign(paste0(substr(item, 15, 10000), "_log_lik"),
loo(eval(parse(text=paste0(substr(item, 15, 10000), "_log_lik")))))
}
compare(cp_model_looped_log_lik, cp_model_vectorized_log_lik) # -2.8
compare(ncp_model_looped_log_lik, ncp_model_vectorized_log_lik) # -2.8
compare(ncp_model_vectorized_log_lik, cp_model_vectorized_log_lik) # 0
source("http://peterhaschke.com/Code/multiplot.R")
p1 <- plot(eight_schools_cp_model_looped, pars = c("mu", "tau", "theta"))
p2 <- plot(eight_schools_cp_model_vectorized, pars = c("mu", "tau", "theta"))
p3 <- plot(eight_schools_ncp_model_looped, pars = c("mu", "tau", "theta"))
p4 <- plot(eight_schools_ncp_model_vectorized, pars = c("mu", "tau", "theta"))
multiplot(p1, p2, p3, p4, cols = 2)
The sessionInfo()
is as following:
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] loo_1.1.0 bindrcpp_0.2 rstantools_1.3.0
[4] rstan_2.16.2 StanHeaders_2.16.0-1 magrittr_1.5
[7] dplyr_0.7.3 purrr_0.2.3 readr_1.1.1
[10] tidyr_0.7.1 tibble_1.3.4 ggplot2_2.2.1.9000
[13] tidyverse_1.1.1
loaded via a namespace (and not attached):
[1] reshape2_1.4.2 haven_1.1.0 lattice_0.20-35
[4] colorspace_1.3-2 stats4_3.4.1 rlang_0.1.2
[7] foreign_0.8-69 glue_1.1.1 modelr_0.1.1
[10] readxl_1.0.0 matrixStats_0.52.2 bindr_0.1
[13] plyr_1.8.4 stringr_1.2.0 munsell_0.4.3
[16] gtable_0.2.0 cellranger_1.1.0 rvest_0.3.2
[19] codetools_0.2-15 psych_1.7.8 labeling_0.3
[22] inline_0.3.14 forcats_0.2.0 parallel_3.4.1
[25] broom_0.4.2 Rcpp_0.12.12 KernSmooth_2.23-15
[28] scales_0.5.0.9000 jsonlite_1.5 gridExtra_2.3
[31] mnormt_1.5-5 hms_0.3 digest_0.6.12
[34] stringi_1.1.5 tools_3.4.1 lazyeval_0.2.0
[37] pkgconfig_2.0.1 xml2_1.1.1 lubridate_1.6.0
[40] rstudioapi_0.7 assertthat_0.2.0 httr_1.3.1
[43] R6_2.2.2 nlme_3.1-131 compiler_3.4.1