Stan::variational::normal_meanfield::calc_grad - can be falsely driven by tranformed parameters?

I get this error when I try to run my (apparently well behaved, see below) model with vb

Chain 1: Begin eta adaptation.
Chain 1: Iteration:   1 / 250 [  0%]  (Adaptation)
Chain 1: Iteration:  50 / 250 [ 20%]  (Adaptation)
Chain 1: Iteration: 100 / 250 [ 40%]  (Adaptation)
Chain 1: Iteration: 150 / 250 [ 60%]  (Adaptation)
Chain 1: Iteration: 200 / 250 [ 80%]  (Adaptation)
Chain 1: Success! Found best value [eta = 1] earlier than expected.
Chain 1: 
Chain 1: Begin stochastic gradient ascent.
Chain 1:   iter             ELBO   delta_ELBO_mean   delta_ELBO_med   notes 
Error in sampler$call_sampler(c(args, dotlist)) : 
  stan::variational::normal_meanfield::calc_grad: The number of dropped evaluations has reached its maximum amount (10). Your model may be either severely ill-conditioned or misspecified.

However when I look for pathologies, I see my “worst” behaving parameters are actually pretty fine

fit %>% summary(pars=c("lambda_mu", "exposure_rate", "sigma_correction_param" , "prop_1", "prop_a", "prop_b", "prop_c", "prop_d", "prop_e")) %$% summary %>% as_tibble(rownames="par") %>% arrange(n_eff)
# A tibble: 1,090 x 11
   par                           mean se_mean     sd   `2.5%`   `25%`   `50%`  `75%` `97.5%` n_eff  Rhat
   <chr>                        <dbl>   <dbl>  <dbl>    <dbl>   <dbl>   <dbl>  <dbl>   <dbl> <dbl> <dbl>
 1 prop_1[13,1]               0.0994  0.00675 0.0383 0.00907  0.0824  0.102   0.124   0.164   32.2  1.10
 2 prop_1[13,4]               0.893   0.00677 0.0388 0.823    0.867   0.890   0.912   0.983   32.8  1.09
 3 prop_1[17,4]               0.799   0.0112  0.0983 0.631    0.722   0.781   0.882   0.979   76.6  1.01
 4 prop_1[17,2]               0.186   0.0106  0.0942 0.0157   0.106   0.201   0.262   0.343   78.5  1.01
 5 prop_a[15,1]               0.244   0.00832 0.0786 0.0908   0.190   0.253   0.298   0.387   89.2  1.02
 6 prop_1[12,2]               0.00973 0.00149 0.0143 0.000491 0.00140 0.00280 0.0114  0.0503  91.8  1.03
 7 prop_1[12,4]               0.986   0.00154 0.0150 0.941    0.985   0.993   0.995   0.997   94.6  1.03
 8 prop_a[15,4]               0.558   0.00673 0.0708 0.434    0.505   0.555   0.604   0.697  111.   1.02
 9 sigma_correction_param[22] 0.901   0.0254  0.268  0.257    0.735   0.921   1.09    1.36   112.   1.04
10 prop_1[15,4]               0.850   0.00955 0.102  0.625    0.772   0.882   0.938   0.974  115.   1.00
# … with 1,080 more rows

A part few samples. All parameters as far as I can see are normally distributed, fairly symmetric.

However, when I look at the transformed parameters, the sampling metrics show “bad behaviour”

fit %>% summary %$% summary %>% as_tibble(rownames="par") %>% arrange(n_eff)
# A tibble: 17,531 x 11
   par          mean se_mean    sd `2.5%`  `25%`  `50%`  `75%` `97.5%` n_eff    Rhat
   <chr>       <dbl>   <dbl> <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl> <dbl>   <dbl>
 1 sigma[6]   1.50         0     0 1.50   1.50   1.50   1.50    1.50    1.51   0.990
 2 sigma[37]  1.52         0     0 1.52   1.52   1.52   1.52    1.52    1.51 Inf    
 3 sigma[78]  1.55         0     0 1.55   1.55   1.55   1.55    1.55    1.51   1.47 
 4 sigma[128] 1.40         0     0 1.40   1.40   1.40   1.40    1.40    1.51   1.47 
 5 sigma[129] 0.753        0     0 0.753  0.753  0.753  0.753   0.753   1.51   0.990
 6 sigma[143] 0.0917       0     0 0.0917 0.0917 0.0917 0.0917  0.0917  1.51 NaN    
 7 sigma[190] 3.11         0     0 3.11   3.11   3.11   3.11    3.11    1.51   1.47 
 8 sigma[218] 3.22         0     0 3.22   3.22   3.22   3.22    3.22    1.51   0.990
 9 sigma[273] 0.390        0     0 0.390  0.390  0.390  0.390   0.390   1.51   1.47 
10 sigma[287] 3.07         0     0 3.07   3.07   3.07   3.07    3.07    1.51 Inf    

Can it be that the error for vb is driven by those parameters? Would this be a false alarm?

Thanks.

My session

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.4 (Final)

Matrix products: default
BLAS: /stornext/System/data/bioinf-centos6/bioinf-software/bioinfsoftware/R/R-3.5.1/lib64/R/lib/libRblas.so
LAPACK: /stornext/System/data/bioinf-centos6/bioinf-software/bioinfsoftware/R/R-3.5.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rstan_2.19.2          StanHeaders_2.18.1-10 limma_3.38.2          data.tree_0.7.8       magrittr_1.5          foreach_1.4.4        
 [7] forcats_0.4.0         stringr_1.4.0         dplyr_0.8.0.1         purrr_0.3.0           readr_1.1.1           tidyr_0.8.2          
[13] tibble_2.0.1          ggplot2_3.1.0         tidyverse_1.2.1      

loaded via a namespace (and not attached):
 [1] nlme_3.1-137              matrixStats_0.54.0        lubridate_1.7.4           RColorBrewer_1.1-2        httr_1.4.0               
 [6] LaplacesDemon_16.1.1      tools_3.5.1               utf8_1.1.4                R6_2.4.0                  lazyeval_0.2.1           
[11] colorspace_1.4-0          withr_2.1.2               prettyunits_1.0.2         processx_3.2.1            tidyselect_0.2.5         
[16] gridExtra_2.3             mnormt_1.5-5              compiler_3.5.1            cli_1.0.1                 rvest_0.3.2              
[21] arrayhelpers_1.0-20160527 xml2_1.2.0                influenceR_0.1.0          plotly_4.8.0              labeling_0.3             
[26] diptest_0.75-7            scales_1.0.0              psych_1.8.4               ggridges_0.5.1            callr_3.1.1              
[31] digest_0.6.18             foreign_0.8-70            pkgconfig_2.0.2           htmltools_0.3.6           htmlwidgets_1.3          
[36] rlang_0.3.1               readxl_1.1.0              rstudioapi_0.9.0          visNetwork_2.0.4          svUnit_0.7-12            
[41] jsonlite_1.6              gtools_3.8.1              rgexf_0.15.3              inline_0.3.15             modes_0.7.0              
[46] loo_2.0.0                 Matrix_1.2-14             Rcpp_1.0.0                munsell_0.5.0             fansi_0.4.0              
[51] viridis_0.5.1             stringi_1.3.1             yaml_2.2.0                MASS_7.3-50               pkgbuild_1.0.2           
[56] plyr_1.8.4                ggstance_0.3              grid_3.5.1                parallel_3.5.1            listenv_0.7.0            
[61] crayon_1.3.4              lattice_0.20-35           haven_1.1.2               hms_0.4.2                 knitr_1.21               
[66] ps_1.3.0                  pillar_1.3.1              igraph_1.2.4              widyr_0.1.1               reshape2_1.4.3           
[71] codetools_0.2-15          stats4_3.5.1              XML_3.98-1.17             glue_1.3.0.9000           downloader_0.4           
[76] data.table_1.12.0         modelr_0.1.2              cellranger_1.1.0          gtable_0.2.0              future_1.11.1.1          
[81] assertthat_0.2.0          xfun_0.5                  broom_0.4.5               coda_0.19-1               viridisLite_0.3.0        
[86] iterators_1.0.10          Rook_1.1-1                tidybayes_1.0.1           DiagrammeR_1.0.0          globals_0.12.4           
[91] brew_1.0-6 

This is confusing as we don’t have well defined R-hat for variational inference. Our usual interfaces don’t compute that. Thus, it’s not clear what this Rhat you show in your tables is.

I run both with sampling, and then try vb.

Basically would be enough to know if (in either sampling or vb) the transformed parameters are considered for convergence metrics warnings. If not, means that despite what I see my model have issues.

calc_grad fails if some draws from the variational approximation in unconstrained space are such that gradients with those parameter values can’t be computed. You may get this work with better initial values, but since you can currently init only the mode of the variational approximation and not the covariance it may still fail.

If you get the above error you shouldn’t get any posterior draws.

Can you bit more clear which error and which diagnostic is for which.

Convergence warnings in which interface? The calc_grad error is not a convergence warning.

With NUTS is apparently all good

With vb it fails.

However, I am realisising that I should look more at Rhat and neff, and less relying on divergence warnings and traceplots.

I think that few parameter were not mixing, and therefore is the model not defined enough for vb.

Thanks afor your time Aki.

Rhat is (currently) only for MCMC (NUTS is also MCMC). Divergences and traceplots are only for MCMC. How are you checking that vb fails?

Mixing is a term related to MCMC but not to vb. How do you infer from bad mixing in MCMC that vb would be bad?

Yes, apologies @stemangiola, there’s an issue open to fix this so we can give you diagnostics for VB that make sense. The current diagnostics are only for MCMC and are misleading.

1 Like

My understanding that for vb to work, the model convergence/mixing should be perfect, with parameters around the scale normal(0,1). It this correct?