Contrasts problem with "reloo" and "kfold"

Hello!

I am unable to compute grouped kfold cross validation with brms and develop version of rstan.

EDIT : It happens also when I use reloo = T in case of too high pareto k values in loo.

Here is a minimal reproducible example :

D <- data.frame(grp = rep(LETTERS[1:4], each = 20))

D$y <- rnorm(nrow(D))

fit <- brm(y ~ 1, data = D, cores = 2, chains = 2, iter = 1000)

kfold(fit, group = "grp")

I leads to the following error:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

Does someone have any idea how to solve this?

Thank you very much!
Lucas

sessionInfo()
R version 4.0.1 (2020-06-06)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=fr_CA.UTF-8       LC_NUMERIC=C               LC_TIME=fr_CA.UTF-8        LC_COLLATE=fr_CA.UTF-8     LC_MONETARY=fr_CA.UTF-8   
 [6] LC_MESSAGES=fr_CA.UTF-8    LC_PAPER=fr_CA.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] brms_2.13.0     Rcpp_1.0.4.6    readxl_1.3.1    lubridate_1.7.9 forcats_0.5.0   stringr_1.4.0   dplyr_1.0.0     purrr_0.3.4    
 [9] readr_1.3.1     tidyr_1.1.0     tibble_3.0.1    ggplot2_3.3.2   tidyverse_1.3.0

loaded via a namespace (and not attached):
  [1] colorspace_1.4-1     ellipsis_0.3.1       rio_0.5.16           ggridges_0.5.2       rsconnect_0.8.16     markdown_1.1        
  [7] base64enc_0.1-3      fs_1.4.1             rstudioapi_0.11      listenv_0.8.0        farver_2.0.3         rstan_2.21.1        
 [13] DT_0.13              fansi_0.4.1          mvtnorm_1.1-1        xml2_1.3.2           bridgesampling_1.0-0 codetools_0.2-16    
 [19] shinythemes_1.1.2    bayesplot_1.7.2      jsonlite_1.6.1       packrat_0.5.0        broom_0.5.6          dbplyr_1.4.4        
 [25] shiny_1.4.0.2        compiler_4.0.1       httr_1.4.1           backports_1.1.8      assertthat_0.2.1     Matrix_1.2-18       
 [31] fastmap_1.0.1        cli_2.0.2            later_1.1.0.1        htmltools_0.5.0      prettyunits_1.1.1    tools_4.0.1         
 [37] igraph_1.2.5         coda_0.19-3          gtable_0.3.0         glue_1.4.1           reshape2_1.4.4       V8_3.2.0            
 [43] carData_3.0-4        cellranger_1.1.0     vctrs_0.3.1          nlme_3.1-147         crosstalk_1.1.0.1    globals_0.12.5      
 [49] ps_1.3.3             rvest_0.3.5          openxlsx_4.1.5       mime_0.9             miniUI_0.1.1.1       lifecycle_0.2.0     
 [55] gtools_3.8.2         future_1.17.0        zoo_1.8-8            scales_1.1.1         colourpicker_1.0     hms_0.5.3           
 [61] promises_1.1.1       Brobdingnag_1.2-6    parallel_4.0.1       inline_0.3.15        shinystan_2.5.0      curl_4.3            
 [67] gridExtra_2.3        loo_2.2.0            StanHeaders_2.21.0-5 stringi_1.4.6        dygraphs_1.1.1.6     pkgbuild_1.0.8      
 [73] zip_2.0.4            rlang_0.4.6          pkgconfig_2.0.3      matrixStats_0.56.0   lattice_0.20-41      labeling_0.3        
 [79] rstantools_2.0.0     htmlwidgets_1.5.1    processx_3.4.2       tidyselect_1.1.0     plyr_1.8.6           magrittr_1.5        
 [85] R6_2.4.1             generics_0.0.2       DBI_1.1.0            withr_2.2.0          pillar_1.4.4         haven_2.3.1         
 [91] foreign_0.8-79       xts_0.12-0           abind_1.4-5          modelr_0.1.8         crayon_1.3.4         car_3.0-8           
 [97] grid_4.0.1           data.table_1.12.8    blob_1.2.1           callr_3.4.3          threejs_0.3.3        reprex_0.3.0        
[103] digest_0.6.25        xtable_1.8-4         httpuv_1.5.4         RcppParallel_5.0.1   stats4_4.0.1         munsell_0.5.0       
[109] shinyjs_1.1     

Ok, I have so many bugs with factors/characters in function processing posterior draws that I will make a clean install with everything from CRAN and see what happens…

Hi ldeschamps,

did you already solve the problem? I noticed that your fit does not include the grouping variable grp, so the model does not generate predictions for the groups separately.
Does the problem persist if you try fit <- brm(y ~ grp, data = D, cores = 2, chains = 2, iter = 1000) instead?

Best,
Julian

Hi!

Thank you for your answer!

The biggest problem I encountered was because I defined the categorical variable as a character instead of a factor. Now, I carefully define all categories as factors (which is good because it forces me to choose the levels). The funny thing is that I transformed all factors into characters before that because I was tired to deal with missing levels after I removed some categories.

Concerning this particular example, I understood that the categorical effect should be hierarchical in order to compute group-wise cross-validation. If you put it as “fixed” effects, the design matrix changes between folds and they can’t be compared anymore.

All the best!
Lucas

1 Like