Can vectorized versions yield different divergences/loo values?

eolica · October 18, 2017, 12:42pm

Hi all again,

I’m again with the eight-school examples. Because my target model (not this school one) is not running, I’m trying all sorts of non-centered reparameterization and vectorization as I see fit, somewhat similar to what I found here. Whilst this, I ran across something that I don’t understand—vectorized Stan models which are otherwise the same yield different divergence statistics and parameter estimations.

I compare six different versions of 8-school Stan model. Three of them are centered parameterizations where

the model is in a loop form,
the model is vectorized, and
the model is vectorized and without any generated quantities (e.g. without computing log likelihood).

The remaining three are non-centered counterparts. So the key difference between (1) and (2) is whether they have in the model block

  for(j in 1:J){
    theta[j] ~ normal(mu, tau);
    y[j] ~ normal(theta[j], sigma);
  }

or

  theta ~ normal(mu, tau);
  y ~ normal(theta, sigma);

I initially thought that among the centered models, all three versions should yield exactly the same output, just different in speed of getting there, and similarly for the non-centered. What I think I found instead was that

looped versions have less divergence,
looped versions have much tighter estimations of parameters,
looped versions perform better, obviously (in terms of comparison with the loo package),
same vectorized model with and without generated quantities computation have different divergences and E-BFMIs,
this disparity is less in the non-centered versions.

I don’t understand—should vectorizing or not-vectorizing make any differences in diagnostics/estimation quality? Or having/not having generated quantities block? If this is the case, I’m not sure what to do, because then there is a tradeoff between doing vectorized versus looped versions in terms of speed vs. quality.

I attach the six Stan models that I used, and the R script with which I compiled them. The whole thing takes about 5 minutes. Please let me know if I’m doing anything wrong.

Thank you so much in advance.

eight_schools_cp_model_looped.stan (497 Bytes)
eight_schools_cp_model_vectorized_no_gq.stan (260 Bytes)
eight_schools_ncp_model_vectorized_no_gq.stan (359 Bytes)
eight_schools_cp_model_vectorized.stan (459 Bytes)
eight_schools_ncp_model_looped.stan (596 Bytes)
eight_schools_ncp_model_vectorized.stan (560 Bytes)

library(rstan)
library(loo)

schools_dat <- list(J = 8, 
                    y = c(28,  8, -3,  7, -1,  1, 18, 12),
                    sigma = c(15, 10, 16, 11,  9, 11, 10, 18))

# Model 1 --------------------------------------------------------
eight_schools_cp_model_vectorized <- 
    stan(file = "eight_schools_cp_model_vectorized.stan", 
         data = schools_dat, seed = 194838, 
         iter = 2000, chains = 4)

check_div(eight_schools_cp_model_vectorized) 
check_energy(eight_schools_cp_model_vectorized)
# "85 of 4000 iterations ended with a divergence (2.125%)"
# (No E-BFMI problem)
# 0.263, 0.188, 0.169, 0.148 seconds each chain.
         
# Adding generated quantities part changes divergence?!!! Why?!
eight_schools_cp_model_vectorized_no_gq <- 
    stan(file = "eight_schools_cp_model_vectorized_no_gq.stan", 
         data = schools_dat, seed = 194838, 
         iter = 2000, chains = 4)

check_div(eight_schools_cp_model_vectorized_no_gq) 
check_energy(eight_schools_cp_model_vectorized_no_gq)
# "158 of 4000 iterations ended with a divergence (3.95%)"
# [1] "Chain 4: E-BFMI = 0.152798760202411"
# [1] "E-BFMI below 0.2 indicates you may need to reparameterize your model"

# Model 2 --------------------------------------------------------
eight_schools_cp_model_looped <- 
    stan(file = "eight_schools_cp_model_looped.stan", 
         data = schools_dat, seed = 194838, 
         iter = 2000, chains = 4)

check_div(eight_schools_cp_model_looped) 
check_energy(eight_schools_cp_vmodel_looped)
# "0 of 4000 iterations ended with a divergence (0%)"
# (No E-BFMI problem)
# 0.247, 0.159, 0.144, 0.151 seconds each chain


# Model 3 --------------------------------------------------------
eight_schools_ncp_model_vectorized <- 
    stan(file = "eight_schools_ncp_model_vectorized.stan", 
         data = schools_dat, seed = 194838, 
         iter = 2000, chains = 4)

check_div(eight_schools_ncp_model_vectorized) 
check_energy(eight_schools_ncp_model_vectorized)
# "1 of 4000 iterations ended with a divergence (0.025%)"
# (No E-BFMI problem)
# 0.192, 0.183, 0.243, 0.103 seconds each chain

# Model 3.5 --------------------------------------------------------
eight_schools_ncp_model_vectorized_no_gq <- 
    stan(file = "eight_schools_ncp_model_vectorized_no_gq.stan", 
         data = schools_dat, seed = 194838, 
         iter = 2000, chains = 4)

check_div(eight_schools_ncp_model_vectorized_no_gq) 
check_energy(eight_schools_ncp_model_vectorized_no_gq)
# "3 of 4000 iterations ended with a divergence (0.075%)"
# (No E-BFMI problem)
# 0.128, 0.278, 0.206, 0.121 seconds each chain

# Model 7 --------------------------------------------------------
eight_schools_ncp_model_looped <- 
    stan(file = "eight_schools_ncp_model_looped.stan", 
         data = schools_dat, seed = 194838, 
         iter = 2000, chains = 4)

check_div(eight_schools_ncp_model_looped) 
check_energy(eight_schools_ncp_model_looped)
# "0 of 4000 iterations ended with a divergence (0%)"
# (No E-BFMI problem)
# 0.356, 0.211, 0.239, 0.235 seconds

cp_no_gq <- eight_schools_cp_model_vectorized_no_gq
ncp_no_gq <- eight_schools_ncp_model_vectorized_no_gq
rm(list=ls(pattern = "_no_gq"))

for(item in ls(pattern = "eight_schools_")){
  print(item)
  assign(paste0(substr(item, 15, 10000), "_log_lik"), extract_log_lik(eval(parse(text=item))))
  assign(paste0(substr(item, 15, 10000), "_log_lik"), 
    loo(eval(parse(text=paste0(substr(item, 15, 10000), "_log_lik")))))
}

compare(cp_model_looped_log_lik, cp_model_vectorized_log_lik) # -2.8
compare(ncp_model_looped_log_lik, ncp_model_vectorized_log_lik) # -2.8
compare(ncp_model_vectorized_log_lik, cp_model_vectorized_log_lik) # 0

source("http://peterhaschke.com/Code/multiplot.R")
p1 <- plot(eight_schools_cp_model_looped, pars = c("mu", "tau", "theta"))
p2 <- plot(eight_schools_cp_model_vectorized, pars = c("mu", "tau", "theta"))
p3 <- plot(eight_schools_ncp_model_looped, pars = c("mu", "tau", "theta"))
p4 <- plot(eight_schools_ncp_model_vectorized, pars = c("mu", "tau", "theta"))
multiplot(p1, p2, p3, p4, cols = 2)

The sessionInfo() is as following:

R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] loo_1.1.0            bindrcpp_0.2         rstantools_1.3.0    
 [4] rstan_2.16.2         StanHeaders_2.16.0-1 magrittr_1.5        
 [7] dplyr_0.7.3          purrr_0.2.3          readr_1.1.1         
[10] tidyr_0.7.1          tibble_1.3.4         ggplot2_2.2.1.9000  
[13] tidyverse_1.1.1     

loaded via a namespace (and not attached):
 [1] reshape2_1.4.2     haven_1.1.0        lattice_0.20-35   
 [4] colorspace_1.3-2   stats4_3.4.1       rlang_0.1.2       
 [7] foreign_0.8-69     glue_1.1.1         modelr_0.1.1      
[10] readxl_1.0.0       matrixStats_0.52.2 bindr_0.1         
[13] plyr_1.8.4         stringr_1.2.0      munsell_0.4.3     
[16] gtable_0.2.0       cellranger_1.1.0   rvest_0.3.2       
[19] codetools_0.2-15   psych_1.7.8        labeling_0.3      
[22] inline_0.3.14      forcats_0.2.0      parallel_3.4.1    
[25] broom_0.4.2        Rcpp_0.12.12       KernSmooth_2.23-15
[28] scales_0.5.0.9000  jsonlite_1.5       gridExtra_2.3     
[31] mnormt_1.5-5       hms_0.3            digest_0.6.12     
[34] stringi_1.1.5      tools_3.4.1        lazyeval_0.2.0    
[37] pkgconfig_2.0.1    xml2_1.1.1         lubridate_1.6.0   
[40] rstudioapi_0.7     assertthat_0.2.0   httr_1.3.1        
[43] R6_2.2.2           nlme_3.1-131       compiler_3.4.1

bbbales2 · October 19, 2017, 1:00am

Aaaah, this took me a bit. The issue is:

y[j] ~ normal(theta[j], sigma);

should be

y[j] ~ normal(theta[j], sigma[j]);

Because sigma is a vector (each school has its own sigma). Vectorization does its thing if any of the inputs are vectors, so the first thing was actually expanding out to:

y[j] ~ normal(theta[j], sigma[1]);
y[j] ~ normal(theta[j], sigma[2]);
y[j] ~ normal(theta[j], sigma[3]);
...

y[j] ~ normal(theta[j], sigma[J]);

I’ve seen this produce a bug before. I kinda feel like there should be a warning about this (@Bob_Carpenter, @mitzimorris, opinions? Maybe I’m missing something), but it’s valid syntax.

same vectorized model with and without generated quantities computation have different divergences and E-BFMIs

There should definitely be no difference here. Maybe let the random seeds be random and see if things maybe agree in average?

this disparity is less in the non-centered versions.

Make sure and fix the sigma issues in these as well.

Hope that helps!

eolica · October 19, 2017, 1:41am

Hi Ben, thanks again for replying so quickly! :) Much appreciated.

Holy cheesecakes, yes I did miss that. When that’s corrected (from sigma to sigma[j]) the loo comparison correctly yields zero, and centered-and-looped-version’s divergences increase. Output for the corrected Models 2 and 4 are below.

However, it is still intriguing to me that

when the model computes generated quantities, the divergences change compared to the case where it doesn’t (i.e., Model 1 vs. 1.5, Model 3 vs. 3.5)
even when corrected, divergences differ between vectorized vs. non-vectorized models that are otherwise the same, with same seed of seed = 194838 (I have yet to follow up on your suggestion of random seeds and averaging the results.)
disparity is (still) less in non-centered versions.

I attach corrected models 2 and 4.

eight_schools_ncp_model_looped.stan (599 Bytes)
eight_schools_cp_model_looped.stan (500 Bytes)

# Model 2 --------------------------------------------------------
eight_schools_cp_model_looped <- 
    stan(file = "eight_schools_cp_model_looped.stan", 
         data = schools_dat, seed = 194838, 
         iter = 2000, chains = 4)

check_div(eight_schools_cp_model_looped) 
check_energy(eight_schools_cp_model_looped)
# "105 of 4000 iterations ended with a divergence (2.625%)"
# (No E-BFMI problem)
# 0.203, 0.260, 0.203, 0.186 seconds each chain

# Model 4 --------------------------------------------------------
eight_schools_ncp_model_looped <- 
    stan(file = "eight_schools_ncp_model_looped.stan", 
         data = schools_dat, seed = 194838, 
         iter = 2000, chains = 4)

check_div(eight_schools_ncp_model_looped) 
check_energy(eight_schools_ncp_model_looped)
# "0 of 4000 iterations ended with a divergence (0%)"
# (No E-BFMI problem)
# 0.107, 0.112, 0.095, 0.209 seconds

betanalpha · October 19, 2017, 4:12am

Calling the RNG in generated quantities pulls from the common RNG deep in Stan that’s also used for generating the Hamiltonian Monte Carlo transitions. In other words, once you start using RNGs in generated quantities you don’t get exactly the same Markov chains and indeed you will variation in divergences and other chain properties. Interestingly enough Hamiltonian Monte Carlo tends to mix so quickly that small changes like this cause appreciable differences to the exact chain realization – but the expectation estimates (mean, median, etc) should all be the same within the MCMC standard error.

To confirm compare no generated quantities to generated quantities that doesn’t use an RNG to one that does.

eolica · October 19, 2017, 6:33am

Hi Michael,

I see. Yes, you are right. When I comment out the RNGs, it’s equal to the no generated quantities case. So that mystery is solved!

I’m still curious as to why vectorized vs. non-vectorized will yield different diagnostics. It’s espcially intriguing if you investigate the n_eff and Rhat. The following are summaries from the above models (looped versions corrected):

> summary(eight_schools_cp_model_vectorized, pars = c("mu", "tau", "theta"))$summary
         mean se_mean   sd  2.5%   25%  50%  75% 97.5% n_eff Rhat
mu       4.37   0.142 3.38 -2.21 2.171 4.34 6.66  11.3   562 1.00
tau      3.98   0.151 2.91  0.73 1.864 3.23 5.29  11.4   368 1.01
theta[1] 6.32   0.194 5.79 -3.25 2.664 5.59 9.45  19.5   894 1.00
theta[2] 5.01   0.149 4.77 -4.13 2.038 4.82 8.01  14.6  1026 1.00
theta[3] 3.91   0.161 5.28 -7.53 0.860 4.07 7.21  14.0  1077 1.00
theta[4] 4.70   0.149 4.91 -5.17 1.692 4.56 7.88  14.3  1082 1.00
theta[5] 3.59   0.152 4.64 -6.04 0.788 3.66 6.59  12.4   934 1.00
theta[6] 4.03   0.161 4.92 -6.27 1.096 3.99 7.12  13.6   931 1.00
theta[7] 6.52   0.187 5.21 -2.30 3.029 6.01 9.49  18.7   777 1.00
theta[8] 4.88   0.159 5.64 -6.46 1.713 4.68 8.06  16.4  1255 1.00

> summary(eight_schools_cp_model_looped, pars = c("mu", "tau", "theta"))$summary
         mean se_mean   sd   2.5%   25%  50%  75% 97.5% n_eff Rhat
mu       4.60   0.320 3.45 -2.176 2.212 4.55 7.00  10.8   116 1.03
tau      3.99   0.285 3.12  0.664 1.765 3.14 5.30  12.2   120 1.03
theta[1] 6.55   0.214 5.78 -3.643 2.818 6.16 9.69  20.6   731 1.00
theta[2] 5.30   0.185 4.79 -4.173 2.320 5.31 8.32  15.3   669 1.01
theta[3] 4.11   0.230 5.62 -7.886 1.048 4.26 7.75  14.4   600 1.02
theta[4] 5.01   0.161 4.73 -4.273 2.020 4.85 8.14  14.2   860 1.01
theta[5] 3.68   0.342 5.01 -6.630 0.715 3.89 7.18  12.3   215 1.02
theta[6] 4.23   0.276 5.04 -6.501 1.270 4.46 7.72  13.6   334 1.02
theta[7] 6.79   0.199 5.22 -2.639 3.274 6.52 9.96  18.1   684 1.00
theta[8] 5.22   0.196 5.62 -5.943 1.951 5.01 8.57  17.0   820 1.01

> summary(eight_schools_ncp_model_vectorized, pars = c("mu", "tau", "theta"))$summary
         mean se_mean   sd   2.5%   25%  50%  75% 97.5% n_eff  Rhat
mu       4.35  0.0549 3.26 -2.161 2.098 4.40 6.61  10.5  3520 1.001
tau      3.58  0.0559 3.15  0.129 1.285 2.75 5.09  11.6  3178 1.000
theta[1] 6.28  0.0904 5.64 -3.059 2.738 5.69 8.99  20.0  3897 1.000
theta[2] 4.94  0.0715 4.52 -3.726 1.962 4.91 7.70  14.4  4000 0.999
theta[3] 3.88  0.0876 5.26 -7.506 1.016 4.21 7.05  13.8  3600 1.000
theta[4] 4.73  0.0748 4.73 -4.460 1.879 4.69 7.61  14.1  4000 0.999
theta[5] 3.67  0.0735 4.65 -6.154 0.905 3.85 6.73  12.3  4000 1.000
theta[6] 4.01  0.0752 4.76 -5.806 1.300 4.09 6.94  13.0  4000 1.000
theta[7] 6.34  0.0802 5.07 -2.438 3.016 5.92 9.04  17.8  4000 1.000
theta[8] 4.83  0.0842 5.33 -5.826 1.710 4.68 7.84  16.3  4000 1.000

> summary(eight_schools_ncp_model_looped, pars = c("mu", "tau", "theta"))$summary
         mean se_mean   sd  2.5%   25%  50%  75% 97.5% n_eff  Rhat
mu       4.37  0.0530 3.35 -2.19 2.124 4.40 6.66  10.9  4000 1.001
tau      3.61  0.0571 3.18  0.14 1.249 2.76 5.05  11.9  3110 1.000
theta[1] 6.26  0.0865 5.47 -2.94 2.842 5.75 9.04  19.1  4000 0.999
theta[2] 4.99  0.0727 4.60 -3.76 2.098 4.88 7.77  14.7  4000 1.000
theta[3] 3.86  0.0871 5.51 -8.26 1.078 4.12 7.15  13.8  4000 1.000
theta[4] 4.78  0.0766 4.85 -5.03 1.913 4.75 7.66  14.5  4000 1.000
theta[5] 3.61  0.0761 4.81 -7.01 0.882 3.92 6.81  12.2  4000 1.001
theta[6] 4.04  0.0761 4.81 -6.35 1.246 4.27 7.12  12.9  4000 1.000
theta[7] 6.35  0.0803 5.08 -2.27 3.128 5.78 9.11  18.4  4000 1.000
theta[8] 4.82  0.0831 5.26 -5.56 1.874 4.82 7.81  15.5  4000 1.000

What’s interesting is that, while estimated values are not that different in mean and median,

With the centered version, vectorized version yields markedly better n_eff and Rhat.
With the non-centered version, looped version performs better, although not severely different.

Would this behavior be general? In non-centered version, I’m wondering whether by vectorizing we are trading off slight improvements in diagnostics with speed—while in the centered case the vectorized version is always the deal? I don’t know.

For the record, a difference is also documented when there are no generated quantities: looped vs. vectorized, centered versions. Here, oddly enough, the looped version performed better…

> summary(eight_schools_cp_model_looped_no_gq, pars = c("mu", "tau", "theta"))$summary
         mean se_mean   sd   2.5%   25%  50%  75% 97.5% n_eff Rhat
mu       4.32   0.142 3.12 -2.000 2.312 4.46 6.42  10.5   481 1.01
tau      3.70   0.199 2.93  0.574 1.543 2.87 4.93  11.5   217 1.03
theta[1] 6.13   0.183 5.30 -3.257 3.055 5.56 8.60  18.8   841 1.01
theta[2] 4.94   0.165 4.63 -4.441 2.207 4.83 7.64  14.5   785 1.00
theta[3] 3.74   0.148 5.23 -7.716 0.981 4.32 6.82  13.0  1245 1.00
theta[4] 4.70   0.138 4.79 -5.084 2.031 4.49 7.23  14.5  1204 1.00
theta[5] 3.44   0.161 4.68 -7.149 0.839 3.95 6.41  11.7   840 1.00
theta[6] 3.96   0.144 4.50 -5.651 1.347 4.21 6.69  12.6   970 1.00
theta[7] 6.32   0.208 5.05 -2.544 3.373 5.77 8.82  18.2   589 1.01
theta[8] 4.83   0.158 5.26 -6.020 1.948 4.87 7.59  16.0  1107 1.00

> summary(eight_schools_cp_model_vectorized_no_gq, pars = c("mu", "tau", "theta"))$summary
         mean se_mean   sd   2.5%  25%  50%  75% 97.5% n_eff Rhat
mu       4.44   0.161 3.19 -1.771 2.33 4.40 6.56  11.1 392.5 1.02
tau      3.68   0.331 3.10  0.606 1.52 2.88 4.94  11.1  87.7 1.04
theta[1] 6.16   0.227 5.44 -2.945 3.11 5.44 8.77  18.6 576.8 1.01
theta[2] 4.95   0.167 4.58 -3.969 2.20 4.43 7.59  14.6 752.5 1.01
theta[3] 3.87   0.184 5.09 -6.841 1.01 4.06 7.01  13.3 764.3 1.01
theta[4] 4.74   0.163 4.66 -4.350 2.11 4.53 7.45  14.3 815.3 1.01
theta[5] 3.59   0.181 4.55 -6.435 1.04 3.85 6.33  12.3 631.3 1.01
theta[6] 4.06   0.176 4.75 -5.933 1.39 4.33 6.89  13.1 727.2 1.01
theta[7] 6.23   0.200 5.03 -2.514 3.28 5.51 8.85  17.4 632.8 1.01
theta[8] 4.89   0.178 5.25 -5.503 1.90 4.72 7.82  16.0 866.4 1.01

p.s. While I’m asking this… there’s no way to speed up/vectorize the following, is there?

  for(j in 1:J){
    log_lik[j] = normal_lpdf(y[j] | theta[j], sigma[j]);
    y_tilde[j] = normal_rng(theta[j], sigma[j]);
  }

avehtari · October 19, 2017, 7:13am

If this is in the generated quantities block, then the vectorized version would have the same speed especially because there are no shared parameters. I think vectorized rngs are alredy in Stan. _lpdf’s which would return a vector instead of sum are not yet in Stan (and this has a low priority). I’ve noticed that sometimes adding these will increase wall clock time because of the additional I/O and not much due to the computation.

betanalpha · October 19, 2017, 2:56pm

Vectorization should not have a huge impact on results. That said, because floating-point arithmetic is non-commutative we can’t guarantee exact reproducibility as even minuscule differences can cause the chains to decouple.
The results you have are all consistent with each other with respect to the expected MCMC variation (the se_mean column).

Regarding centered vs non-centered, you can’t compare numbers until the diagnostics are clear. First check divergences and E-BFMI then Rhat (ideally with multiple chains). Only if those are all clear can you move on to comparing speed/values.

bbbales2 · October 19, 2017, 3:16pm

I think vectorized rngs are already in Stan.

Still in progress ^^

eolica · October 19, 2017, 6:38pm

Hi Michael, Aki, and Ben,

Thank you so much. I see. Since vectorized RNGs are not there, I’ll just run normal_lpdf and normal_rng as is, in a loop.

Michael, thanks for pointing out that all comparisons should be done after diagnostics are clear. I am actually having great difficulty driving down divergences to a 0% while keeping E-BFMIs also above 0.2 (again, not this school model, this is just an example…), but I presume from what you say that I will not be able to declare that I have any “results” until I do. This is quite frustrating…

I have so far

non-centered all parameters,
vectorized,
ran with quite a lot of iterations (iter = 100000 and warmup = 2000 and thin = 10; I saw somewhere from the Forum where a developer said he never needed more than 100,000 iterations with the correct model)
tweaked controls to control = list(adapt_delta = 0.99, max_treedepth = 15). It’s taking around one day to run a single chain with these iterations/controls.

I have yet to specify a covariance structure among the parameters so this is what I’m thinking of doing next. If you have some further suggestions I would be very grateful to hear them.

I’ll choose Ben’s first answer as the solution for this issue.

bbbales2 · October 19, 2017, 8:05pm

Ben, thanks for pointing out that

Michael said that, but Ben agrees.

he never needed more than 100,000 iterations with

This is definitely way too many. The default 1k is probably plenty for most models (which I assume is why it’s the default). It all comes down to the n_eff value (estimated number of effective samples) that the interfaces produce. When folks talk about error in Monte Carlo methods, they talk about how it scales as 1 / sqrt(N). Cause we’re doing MCMC, samples aren’t totally iid, and so we can’t actually just use the number of samples we took in those error estimates. n_eff is an estimate of what that N would be.

I am actually having great difficulty driving down divergences to a 0% while keeping E-BFMIs also above 0.2
tweaked controls to control = list(adapt_delta = 0.99, max_treedepth = 15)
It’s taking around one day to run a single chain with these iterations/controls.

Maybe start a new thread with this specific model? None of this is good :D, and it’s all kinda model specific.

somewhere from the Forum where a developer said

Eeeehhh, developer tag doesn’t mean infallible. I’ve said some wrong things on here for sure.

eolica · October 19, 2017, 8:23pm

Oops! Yes that was Michael’s.
Thank you so much Ben. I’ll definitely open a new issue with the model.

betanalpha · October 20, 2017, 4:52am

Somebody has to be the bad cop!

Fitting complex models often is. Still, is frustration over diagnostics better or worse than not knowing whether your model is fitting accurately or not?

As @bbbales2 notes, that’s way too many parameters. Stick with the defaults for now as a longer chain won’t really help anything.

If increasing adapt_delta doesn’t remove the divergences then there’s some deeper problems going on with your model. What you want to do is try to identify where the divergences on concentrating in parameter space to see if you can figure out what property of the model is causing problems. See for example Robust Statistical Workflow with RStan and Diagnosing Biased Inference with Divergences.

Sometimes you just need tighter priors to cut off awkward parts of parameter space that aren’t realistic anyways.

In any case, a new thread for the modeling problems is a good idea.

eolica · October 20, 2017, 7:14am

Thank you so much Michael. Reading your post makes things much more lucid! For future reference, I was mistakenly thinking that

increasing iterations would ultimately solve things given that adapt_delta is high enough, and
if divergence is tiny and n_eff/Rhat are fine, I can settle with it.

And yes, indeed, it’s better to know that diagnostics are raising red flags than to live with inaccurate models. You should allow the users to select multiple answers as solutions. :)

At the moment, I am having difficulty translating the geometrical graphs produced as in your Divergence and Bias case study to a constructive modification of the model. But let me first, as you suggested,

tighten the priors, and
comb through the Forum / previous Google Group posts to see if I can find more useful hints.

I’ll come back to the Forum if this does not work out well. Thank you once again.

betanalpha · October 20, 2017, 3:47pm

Great. Keep in mind that fixing complex models like these is really hard even for those of us who are constantly fitting models. The problem is that because each model is unique its pathologies are also unique and so there are no general solutions. Instead you have to really interrogate your model, guided by the informative diagnostics and your own expertise in the model, to identify what could be causing problems.

Topic		Replies	Views
Vectorized != loop Modeling performance	12	953	December 29, 2020
Vectorized vs. unvectorized model Modeling	5	568	October 14, 2019
Divergence after non-centered parameterization Modeling	2	519	January 30, 2019
Same model, same data, same seed, different computers, different number of divergences General	2	673	September 6, 2023
Non centered parameterization on variance parameter Modeling	31	8524	October 22, 2018

Can vectorized versions yield different divergences/loo values?

Related topics