Within-chain parallelization error: All variables in all chains must have the same length

I first ran the mediation analysis using within-chain parallelization (plz see the following code) on my pc. It worked fine with one of my simulated data (I will call it, d12 here.).

  • Operating System: windows 10
  • CmdStan Version: 0.4.0
outcomeMod <- bf(y ~ mediatorCount + pF + rc
                 + (1|i) + (1 + rc|j),
                 family = "bernoulli")

mediatorMod <- bf(mediatorCount ~ talk + pF + (1|i) + (1 + pF|j),
                  family = poisson())

fit <- brm(outcomeMod + mediatorMod + set_rescor(FALSE),
           data = data[[10]], iter = 16000, chains = 4, cores = 4, 
           threads = 2, backend = "cmdstanr",
           seed = 29, inits = 0, save_pars(group = T, all = T), control=list(adapt_delta=0.99))

Then, I am trying to run the same mediation model on cluster with a slight tweak – I increased the thread number from 2 to 4 (the rest of the code stays exactly the same), and used another simulated data (d10 instead of d12). It gave me the following error message.

  • Operating System: CentOS 7.9
  • CmdStan Version: 0.4.0
  • Compiler/Toolkit: gnu 9.3.0
All 4 chains finished successfully.
Mean chain execution time: 29704.0 seconds.
Total execution time: 30050.2 seconds.
grep: write error
grep: write error
grep: write error
grep: write error
Error: All variables in all chains must have the same length.
Execution halted

I am not sure what is going on. But any help is appreciated! Thank you very much!

What domyou get without parallelization?

It worked fine without within-chain parallelization.

  • Operating System: CentOS 7.9
  • CmdStan Version: 0.4.0
  • Compiler/Toolkit: gnu 9.3.0

I have another error message from within -chain parallelization. I am running the same model (4 cores, 4 chains, and 4 threads) using another set of simulated data with a higher sample size. Out of 11 data, 10 resulted in the following error. Consequently, the model objects did not get to be saved.

All 4 chains finished successfully.
Mean chain execution time: 61528.8 seconds.
Total execution time: 61689.3 seconds.
grep: write error
Error: Supplied CSV file is corrupt!
Execution halted

One of the data resulted in a different error message. The model object was saved successfully despite the error. But what is that “grep: write error”? Will that impact my result?

All 4 chains finished successfully.
Mean chain execution time: 61902.3 seconds.
Total execution time: 63299.6 seconds.
grep: write error
grep: write error
grep: write error
grep: write error
2128 of 2128 (100.0%) transitions hit the maximum treedepth limit of 10 or 2^10-1 leapfrog steps.
Trajectories that are prematurely terminated due to this limit will result in slow exploration.
Increasing the max_treedepth limit can avoid this at the expense of more computation.
If increasing max_treedepth does not remove warnings, try to reparameterize the model.

Again, the same datasets can run perfectly fine on the same machine (cluster) without within-chain parallelization.

Thank you in advance for your time and help!

Maybe you can post a full mini example so that this can be debugged?

Tagging @rok_cesnovar and @paul.buerkner so that they can have a look. This looks like an error in cmdstanr (unlikely) or brms, I think.

1 Like

Sure. Here is an example. Thank you very much.
d10.csv (1.6 MB)

I just ran this 2x times without any issues…cmdstan 2.28.1…

R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  datasets  grDevices utils     methods   base     

other attached packages:
[1] brms_2.16.1 Rcpp_1.0.7 

loaded via a namespace (and not attached):
  [1] nlme_3.1-152         matrixStats_0.59.0   xts_0.12.1          
  [4] threejs_0.3.3        rstan_2.21.2         tensorA_0.36.2      
  [7] tools_4.1.0          backports_1.2.1      utf8_1.2.1          
 [10] R6_2.5.0             DT_0.18              DBI_1.1.1           
 [13] mgcv_1.8-35          projpred_2.0.2       colorspace_2.0-2    
 [16] withr_2.4.2          tidyselect_1.1.1     gridExtra_2.3       
 [19] prettyunits_1.1.1    processx_3.5.2       Brobdingnag_1.2-6   
 [22] curl_4.3.2           compiler_4.1.0       cli_3.0.0           
 [25] shinyjs_2.0.0        colourpicker_1.1.0   posterior_1.0.1     
 [28] scales_1.1.1         dygraphs_1.1.1.6     checkmate_2.0.0     
 [31] mvtnorm_1.1-2        ggridges_0.5.3       callr_3.7.0         
 [34] StanHeaders_2.21.0-7 stringr_1.4.0        digest_0.6.27       
 [37] minqa_1.2.4          base64enc_0.1-3      pkgconfig_2.0.3     
 [40] htmltools_0.5.1.1    lme4_1.1-27.1        fastmap_1.1.0       
 [43] htmlwidgets_1.5.3    rlang_0.4.11         shiny_1.6.0         
 [46] farver_2.1.0         generics_0.1.0       jsonlite_1.7.2      
 [49] zoo_1.8-9            crosstalk_1.1.1      gtools_3.9.2        
 [52] dplyr_1.0.7          distributional_0.2.2 inline_0.3.19       
 [55] magrittr_2.0.1       loo_2.4.1            bayesplot_1.8.1     
 [58] Matrix_1.3-3         munsell_0.5.0        fansi_0.5.0         
 [61] abind_1.4-5          lifecycle_1.0.0      stringi_1.7.3       
 [64] MASS_7.3-54          pkgbuild_1.2.0       plyr_1.8.6          
 [67] grid_4.1.0           parallel_4.1.0       promises_1.2.0.1    
 [70] crayon_1.4.1         miniUI_0.1.1.1       lattice_0.20-44     
 [73] splines_4.1.0        knitr_1.33           ps_1.6.0            
 [76] pillar_1.6.1         igraph_1.2.6         boot_1.3-28         
 [79] markdown_1.1         shinystan_2.5.0      codetools_0.2-18    
 [82] reshape2_1.4.4       stats4_4.1.0         rstantools_2.1.1    
 [85] glue_1.4.2           V8_3.4.2             data.table_1.14.0   
 [88] RcppParallel_5.1.4   vctrs_0.3.8          nloptr_1.2.2.2      
 [91] httpuv_1.6.1         gtable_0.3.0         purrr_0.3.4         
 [94] assertthat_0.2.1     ggplot2_3.3.5        xfun_0.24           
 [97] mime_0.11            xtable_1.8-4         coda_0.19-4         
[100] later_1.2.0          rsconnect_0.8.18     tibble_3.1.2        
[103] shinythemes_1.2.0    gamm4_0.2-6          cmdstanr_0.4.0      
[106] ellipsis_0.3.2       bridgesampling_1.1-2
> 
1 Like

Thanks @wds15.

@Marian I would suggest installing the Github version of brms and CmdStanR and also potentially the 2.28 version of CmdStan and trying again.

Thank you very much. I have updated my brms and cmdstanr. I tried the same simulated data with 100 iterations, it worked fine. I have another quick question though. I used update() with within-chain parallelization. But it seems that update() is not ready for within-chain parallelization? If I run things without within-chain parallelization, update() works fine. Thank you!

Here is my code.

  mod = update(fit, newdata = data[[i]], cores = 4, 
               chains = 4, seed = 29, inits = 0, 
               threads = 2, backend = "cmdstanr",
               save_pars(group = T, all = T), control=list(adapt_delta=0.99, max_treedepth = 12)
               recompile = FALSE, file = mypath)

Here is the message I got. “Error: Updating formulas of multivariate models is not yet possible.”

Can you please provide a minimal reproducible example?

Sure. Thank you very much! Here is my code to use update(). I use mod36.rds for update(). It seems that I can’t upload mod36.rds so I will just attach dat36.csv and dat37.csv here.

dat36.csv (3.3 MB)

dat37.csv (3.3 MB)

outcomeMod <- bf(y ~ mediatorCount + pF + rc
                 + (1|i) + (1 + rc|j),
                 family = "bernoulli")

mediatorMod <- bf(mediatorCount ~ talk + pF + (1|i) + (1 + pF|j),
                  family = poisson())

fit <- brm(outcomeMod + mediatorMod + set_rescor(FALSE),
           data = dat36, iter = 16000, chains = 4, cores = 4, 
           threads = 4, backend = "cmdstanr",
           seed = 29, inits = 0, 
           save_pars(group = T, all = T), control=list(adapt_delta=0.99))

mod = update(fit, newdata = dat37, cores = 4, chains = 4,
               threads = 4, backend = "cmdstanr",
               seed = 29, inits = 0,
               save_pars(group = T, all = T), control=list(adapt_delta=0.99),
               recompile = FALSE, file = mypath)

I am running the analysis on a cluster with the following setting.

  • Operating System: CentOS 7.9
  • Compiler/Toolkit: gnu 9.3.0

I am actually trying to run the same analysis with a smaller sample size using my pc (windows 10). And update() is working fine. Hence, I am not entirely sure what is going wrong when I run the same analysis on cluster.

Also, I updated my brms and cmdstanr as rok_cesnovar suggested. I can run within-chain parallelization alright with iterations = 100 or 500. But once I increase the iterations to 16000, errors occur again. Here are the two error messages I have received so far. (Those were run on the cluster as well.)

First error message:

All 4 chains finished successfully.
Mean chain execution time: 74177.5 seconds.
Total execution time: 93260.0 seconds.
grep: write error
grep: write error
grep: write error
grep: write error
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  scan() expected 'a real', got '-'
Calls: brm ... .fit_model -> <Anonymous> -> read.csv -> read.table -> scan
In addition: Warning message:
In readLines(f) :
  incomplete final line found on '/tmp/RtmpClYguK/model_fc6ed18001f7a6c3e9c4dd4b7ca655b0-202112142327-1-25e48d.csv'
Execution halted

Second error message:

All 4 chains finished successfully.
Mean chain execution time: 67620.0 seconds.
Total execution time: 70016.8 seconds.
grep: write error
Error: Supplied CSV file is corrupt!
Execution halted

Thank you very much for your help!!

1 Like

I have resolved 2 of the error messages mentioned above (after the 4 chains were run successfully). If the error msg is pertinent to the CSV file or incomplete final line found on the csv file located in the temp directory, it has something to do with the storage space of the tmp directory. the /tmp directory on my campus nodes is in memory which does not have sufficient GB to /tmp to reserve memory for compute processes. Hence, the error has nothing to do with brms. Yet, I still have problems to use update() while using within-chain parallelization on cluster, but not on pc or mac. Thank you!