R crashes with loo( reloo=T ) and with kfold( )

When trying to compare models in brms, R crashes every time if I do reloo=T or if I do kfold. This happens even with the simplest models.

I’m running brms 2.10.0 on R 3.6.0 and Windows 10.

Thanks

Unfortunately, I cannot offer any advice based on the provided information. I am running brms in the same setup and it works for me so it is not a general problem. In other words, we need more information to have a chance of providing advice.

What information would help?

Let’s start with some code that fails for you a long with the sessionInfo()

Here are the current models I’m working with and the sessionInfo(). Using kfold( ) and loo( reloo=T ) both crash the session almost instantly.

fit0 ← brm( cases ~ 1,
data=d ,
autocor = cor_car( W=w1 , ~ 1 | ID, type=“bym2” ) ,
iter = 20000 ,
family = “Poisson” ,
control = list( adapt_delta=0.999 , max_treedepth=20) ,
cores=4 )

fit1 ← brm( cases ~ t2( Pop , bs=‘tp’ , k=10 ) + t2( z.Avg , bs=‘tp’ , k=10 ),
data=d ,
autocor = cor_car( W=w1 , ~ 1 | ID, type=“bym2” ) ,
iter = 20000 ,
family = “Poisson” ,
control = list( adapt_delta=0.999 , max_treedepth=20) ,
cores=4 )

kfold( fit0 , fit1 , k=10 )

sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] viridis_0.5.1 viridisLite_0.3.0 forcats_0.4.0 stringr_1.4.0 dplyr_0.8.3 purrr_0.3.2
[7] readr_1.3.1 tidyr_0.8.3 tibble_2.1.3 ggplot2_3.2.0 tidyverse_1.2.1 brms_2.10.0
[13] Rcpp_1.0.2 raster_3.0-2 spdep_1.1-2 spData_0.3.0 sp_1.3-1 sf_0.7-7

loaded via a namespace (and not attached):
[1] colorspace_1.4-1 deldir_0.1-23 class_7.3-15 ggridges_0.5.1 rsconnect_0.8.15
[6] markdown_1.1 base64enc_0.1-3 rstudioapi_0.10 rstan_2.19.2 DT_0.8
[11] lubridate_1.7.4 xml2_1.2.2 bridgesampling_0.7-2 codetools_0.2-16 splines_3.6.0
[16] shinythemes_1.1.2 zeallot_0.1.0 bayesplot_1.7.0 jsonlite_1.6 broom_0.5.2
[21] shiny_1.3.2 compiler_3.6.0 httr_1.4.1 backports_1.1.4 assertthat_0.2.1
[26] Matrix_1.2-17 lazyeval_0.2.2 cli_1.1.0 later_0.8.0 htmltools_0.3.6
[31] prettyunits_1.0.2 tools_3.6.0 igraph_1.2.4.1 coda_0.19-3 gtable_0.3.0
[36] glue_1.3.1 reshape2_1.4.3 gmodels_2.18.1 cellranger_1.1.0 vctrs_0.2.0
[41] gdata_2.18.0 nlme_3.1-139 crosstalk_1.0.0 ps_1.3.0 rvest_0.3.4
[46] mime_0.7 miniUI_0.1.1.1 gtools_3.8.1 LearnBayes_2.15.1 MASS_7.3-51.4
[51] zoo_1.8-6 scales_1.0.0 colourpicker_1.0 hms_0.5.1 promises_1.0.1
[56] Brobdingnag_1.2-6 parallel_3.6.0 inline_0.3.15 expm_0.999-4 shinystan_2.5.0
[61] yaml_2.2.0 gridExtra_2.3 loo_2.1.0 StanHeaders_2.18.1-10 stringi_1.4.3
[66] dygraphs_1.1.1.6 e1071_1.7-2 boot_1.3-22 pkgbuild_1.0.5 rlang_0.4.0
[71] pkgconfig_2.0.2 matrixStats_0.54.0 lattice_0.20-38 rstantools_1.5.1 htmlwidgets_1.3
[76] processx_3.4.1 tidyselect_0.2.5 plyr_1.8.4 magrittr_1.5 R6_2.4.0
[81] generics_0.0.2 DBI_1.0.0 mgcv_1.8-28 withr_2.1.2 pillar_1.4.2
[86] haven_2.1.1 units_0.6-4 xts_0.11-2 abind_1.4-5 modelr_0.1.5
[91] crayon_1.3.4 KernSmooth_2.23-15 grid_3.6.0 readxl_1.3.1 callr_3.3.1
[96] threejs_0.3.1 digest_0.6.20 classInt_0.4-1 xtable_1.8-4 httpuv_1.5.1
[101] stats4_3.6.0 munsell_0.5.0 shinyjs_1.0

image

It might be that under certain conditions car models dont work with methods which require refitting but this should result in an informative error. Can you please try out the dev version of brms from github and if that also crashes R provide a minimally reproducible example for me to try out?

That may be - I just did two very simple regressions and kfold worked.

Most of my models have either smooth terms or a CAR structure, so I wonder if that’s why they crash.

Well, I compare them using waic and loo, but I get alerts suggesting I should use reloo=T or kfold instead - should I stick with waic and loo anyway?

I think you could be running out of memory - try running a kfold on each model seperately - you can then compare them using loo_compare(…, criterion = “kfold”).

I dont have a good answer for this really but I would appreciate your input on whether the dev version actually throws something more helpful.

No prob - as soon as R is finished working on a model that’s currently running I’ll install the dev version and let you know.

Well, I get the mysterious non-zero exit error when trying to install the dev version of brms. Trying to troubleshoot that.

Meanwhile, I tried the above suggestion about running a kfold on each model separately, and that also immediately crashed R. This can’t be a memory problem, other models I’m running have got to be far more memory-intensive than a K-fold for this particular one.

Ok, got the dev version 2.10.3 installed, and no change. Just R session aborted with no error as soon as I start kfold.

I will need to see a fully reproducible minimal example in this case to see what happens on my windows machine.

I will see what I can do, my models use confidential health data.

Paul, can I e-mail this to you? This model uses areal data, so I can simulate the study in a way I can’t with point data. It will be fastest if I can just send you an RData file with the data and model objects already in it (in addition to the script).

Could also be simulated data as long as it produces the error you encountered.

Kfold is working with a sim that I made - I ran the sim with 2000 iterations per chain, but my main model was with 20,000 iterations because of ESS issues. I wonder if it is running out of memory. This happens even on a 32 GB machine.

Do you think it would be valid to fit versions of the model with fewer samples and compare that way? The problem is that some of these models take a very long time to run (a recent one took 3 weeks to finish) and even creating smaller versions with fewer samples could be impractical.

20,000 sounds like too much for a Stan model. How much ESS do you want to achieve? Perhaps you can increase the ESS in other ways such as using more informative priors.

I cannot tell if fitting and comparing the model with fewer samples is valid or not. It depends on how much information is required for kfold and loo and this then also depends on how close the models actually are.

It may however be the only solution if R otherwise keeps crashing and you don’t have more RAM available.

I was getting an ESS/Rhat alert with 10,000, so I just doubled it. I’m fairly new to CAR models, but I’m definitely running into that alert at higher iterations than with other model types.