Divergent transitions after warmup using brms

fitting-issues

#1

Hello,

I’m encountering difficulties concerning divergent transitions in brms (system information is pasted at the end of the post).

I have 2 data sets (dat_c.txt (13.9 KB) and dat_v.txt (14.3 KB)) that stem from the same experiment, in which 3 extremely-rare-to-find participants took part (subject_id) and 16 items were included (item_id). There are 3 independent variables (var_c, var_p, var_m), each with 2 levels coded as 1 and -1. The dependent variable (dv) is the duration of some kind of a phoneme, but it is not the same type of phoneme in both data sets, which is why I want to fit two separate models, one for each dependent variable.

Since I don’t have much data, my intention initially was to estimate only random intercepts for participants and items, besides the fixed effects, like here:

# random intercepts model for dat_c
fit_c_ranInt <- brm(dv ~ var_c * var_p * var_m + (1 | subject_id) + (1 | item_id),
                    data=dat_c, chains=4, iter=3000,
                    prior=c(set_prior("normal(0,50)",class="b"),
                            set_prior("normal(0,50)", class="Intercept"),
                            set_prior("normal(0,50)",class="sigma"),
                            set_prior("normal(0,50)",class="sd")),
                     control=list(adapt_delta=0.99, max_treedepth=15, stepsize=.001))


# random intercepts model for dat_v
fit_v_ranInt <- brm(dv ~ var_c * var_p * var_m + (1 | subject_id) + (1 | item_id),
                    data=dat_v, chains=4, iter=3000,
                    prior=c(set_prior("normal(0,50)",class="b"),
                            set_prior("normal(0,50)", class="Intercept"),
                            set_prior("normal(0,50)",class="sigma"),
                            set_prior("normal(0,50)",class="sd")),
                     control=list(adapt_delta=0.99, max_treedepth=15, stepsize=.001))

Now, when I run the fit_c_ranInt model, there are 4 divergent transitions. But when I fit a more complex model, with the most complex random effects structure allowed by the experimental design (fit_c_complex; pasted below), no divergent transitions occur and everything is fine.

By contrast, the random-intercepts model for the other dependent variable (fit_v_ranInt) produces 27 divergent transitions, whereas the more complex fit_v_complex model produces 6 divergent transitions.

# complex model for dat_c
fit_c_complex <- brm(dv ~ var_c * var_p * var_m + (var_c * var_p * var_m | subject_id) + (var_c * var_m | item_id),
             data=dat_c, chains=4, iter=3000,
             prior=c(set_prior("normal(0,50)",class="b"),
                     set_prior("normal(0,50)", class="Intercept"),
                     set_prior("normal(0,50)",class="sigma"),
                     set_prior("normal(0,50)",class="sd"),
                     set_prior("lkj(2)",class="cor")),
             control=list(adapt_delta=0.99, max_treedepth=15, stepsize=.001))


# complex model for dat_v
fit_v_complex <- brm(dv ~ var_c * var_p * var_m + (var_c * var_p * var_m | subject_id) + (var_c * var_m | item_id),
             data=dat_v, chains=4, iter=3000,
             prior=c(set_prior("normal(0,50)",class="b"),
                     set_prior("normal(0,50)", class="Intercept"),
                     set_prior("normal(0,50)",class="sigma"),
                     set_prior("normal(0,50)",class="sd"),
                     set_prior("lkj(2)",class="cor")),
             control=list(adapt_delta=0.99, max_treedepth=15, stepsize=.001))

My questions are:

  1. Why do I get (more) divergent transitions in the less complex model, and less or none in the more complex one, given that I have so little data?
  2. How I can solve the divergence problem for dat_v, for which I didn’t manage to get a well-working model?
  3. Any idea why it should be harder to eliminate the divergent transitions for dat_v more than for dat_c?

Thanks
Yair

System information:

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /opt/R/3.5.2/lib64/R/lib/libRblas.so
LAPACK: /opt/R/3.5.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8       
 [4] LC_COLLATE=fr_FR.UTF-8     LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstan_2.18.2       StanHeaders_2.18.1 brms_2.7.0         ggplot2_3.1.0      Rcpp_1.0.0        

loaded via a namespace (and not attached):
 [1] Brobdingnag_1.2-6    gtools_3.8.1         threejs_0.3.1        shiny_1.2.0          assertthat_0.2.0    
 [6] stats4_3.5.2         yaml_2.2.0           backports_1.1.3      pillar_1.3.1         lattice_0.20-38     
[11] glue_1.3.0           digest_0.6.18        promises_1.0.1       colorspace_1.4-0     htmltools_0.3.6     
[16] httpuv_1.4.5.1       Matrix_1.2-15        plyr_1.8.4           dygraphs_1.1.1.6     pkgconfig_2.0.2     
[21] purrr_0.3.0          xtable_1.8-3         mvtnorm_1.0-8        scales_1.0.0         processx_3.2.1      
[26] later_0.7.5          tibble_2.0.1         bayesplot_1.6.0      DT_0.5               withr_2.1.2         
[31] shinyjs_1.0          lazyeval_0.2.1       cli_1.0.1            magrittr_1.5         crayon_1.3.4        
[36] mime_0.6             ps_1.3.0             nlme_3.1-137         xts_0.11-2           pkgbuild_1.0.2      
[41] colourpicker_1.0     prettyunits_1.0.2    rsconnect_0.8.12     tools_3.5.2          loo_2.0.0           
[46] matrixStats_0.54.0   stringr_1.3.1        munsell_0.5.0        bindrcpp_0.2.2       callr_3.1.1         
[51] compiler_3.5.2       rlang_0.3.1          grid_3.5.2           ggridges_0.5.1       rstudioapi_0.8      
[56] htmlwidgets_1.3      crosstalk_1.0.0      igraph_1.2.2         miniUI_0.1.1.1       base64enc_0.1-3     
[61] codetools_0.2-15     gtable_0.2.0         inline_0.3.15        abind_1.4-5          markdown_0.9        
[66] reshape2_1.4.3       R6_2.3.0             gridExtra_2.3        rstantools_1.5.1     zoo_1.8-4           
[71] bridgesampling_0.6-0 dplyr_0.7.8          bindr_0.1.1          shinystan_2.5.0      shinythemes_1.1.2   
[76] stringi_1.2.4        parallel_3.5.2       tidyselect_0.2.5     coda_0.19-2  


#2

Without deeply understanding your model and data, here are my few cents:

  • One reason divergent transitions can occur is that the model is a very bad fit to the data.
    • Adding more flexibility can let the model be a less wrong fit, but likely does not remove the root cause
  • By default, brms assumes normally distributed response (the family parameter). Your data is not normally distributed (as it is always positive). Lognormal, Exponential or gamma family might work better - and it also likely is a better representation of the data generating process. I now little about phoneme duration, but I believe available theory would tell you which of those distributions is likely to match reality.
  • You can test that your model and priors are sensible via prior predictive checks (see https://arxiv.org/abs/1709.01449 for more details)

#3

Thanks for the input!
This already sounds like good starting points. I’ll give it a try.


#4

Just to shortly follow up on this: the exponential family turns out to be reasonable enough, and the models work without problems.

Thanks again for your help!