Chains finish unexpectedly in new install of CmdStanR

JamieHogg-depo · August 9, 2022, 9:06pm

Hi all,

I am trying to install CmdStanR on Windows 10. I have installed cmdstan via a Conda environment and can get the example model to compile but not sample. I am having very similar problems to here.

file <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan")

# compile model
mod <- cmdstan_model(file) # this works fine!

# sampling
data_list <- list(N = 10, y = c(0,1,0,0,0,0,0,0,0,1))
fit <- mod$sample( # this does not!
  data = data_list, 
  seed = 123, 
  chains = 4, 
  parallel_chains = 4,
  refresh = 500
)

But this fails with the following error.

Running MCMC with 4 parallel chains...

Warning: Chain 1 finished unexpectedly!

Warning: Chain 2 finished unexpectedly!

Warning: Chain 3 finished unexpectedly!

Warning: Chain 4 finished unexpectedly!

Warning: Use read_cmdstan_csv() to read the results of the failed chains.
Warning messages:
1: All chains finished unexpectedly! Use the $output(chain_id) method for more information.
 
2: No chains finished successfully. Unable to retrieve the fit.

Following the error messages I ran fit$output_files() but there were no csv files (i.e. the output read character(0). I thought maybe it had something to do with the parallel chains, but removing this option has no effect on the error message.

I have also tried

fit <- cmdstanr_example(chains = 1)

but this fails with a similar error.

Any ideas as to what could be the problem here? Thank you in advance.

My current R and system info below:

> cmdstan_path()
[1] "C:/Users/n9401849/Anaconda3/envs/stan/Library/bin/cmdstan"
> cmdstan_version()
[1] "2.30.1"
> Sys.info()
         sysname          release          version         nodename          machine 
       "Windows"         "10 x64"    "build 19044" "QUT-PA00146740"         "x86-64" 
           login             user   effective_user 
      "n9401849"       "n9401849"       "n9401849" 
> R.version
               _                           
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          4                           
minor          0.5                         
year           2021                        
month          03                          
day            31                          
svn rev        80133                       
language       R                           
version.string R version 4.0.5 (2021-03-31)
nickname       Shake and Throw

DominiqueMakowski · August 10, 2022, 12:16am

I re-installed all the latest versions yesterday as well and encounters the same error (on meanfield and sampling algo, on one chain and multiple chains).

There’s this message that popped up too:

Compiling Stan program...
Start sampling
Running MCMC with 4 parallel chains...

Warning: Chain 1 finished unexpectedly!

Warning: Chain 2 finished unexpectedly!

Warning: Chain 3 finished unexpectedly!

Warning: Chain 4 finished unexpectedly!

Warning: Use read_cmdstan_csv() to read the results of the failed chains.
Error in cmdstanr::read_cmdstan_csv(out$output_files(), variables = "",  : 
  Assertion on 'files' failed: No file provided.
In addition: Warning messages:
1: All chains finished unexpectedly! Use the $output(chain_id) method for more information.
 
2: No chains finished successfully. Unable to retrieve the fit.

R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_Singapore.utf8  LC_CTYPE=English_Singapore.utf8   
[3] LC_MONETARY=English_Singapore.utf8 LC_NUMERIC=C                      
[5] LC_TIME=English_Singapore.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] brms_2.17.5    Rcpp_1.0.8.3   cmdstanr_0.5.3

loaded via a namespace (and not attached):
 [1] Brobdingnag_1.2-8    jsonlite_1.8.0       gtools_3.9.3        
 [4] StanHeaders_2.21.0-7 RcppParallel_5.1.5   threejs_0.3.3       
 [7] shiny_1.7.2          assertthat_0.2.1     posterior_1.2.2     
[10] distributional_0.3.0 stats4_4.2.0         tensorA_0.36.2      
[13] pillar_1.8.0         backports_1.4.1      lattice_0.20-45     
[16] glue_1.6.2           digest_0.6.29        promises_1.2.0.1    
[19] checkmate_2.1.0      colorspace_2.0-3     htmltools_0.5.2     
[22] httpuv_1.6.5         Matrix_1.4-1         plyr_1.8.7          
[25] dygraphs_1.1.1.6     pkgconfig_2.0.3      rstan_2.21.5        
[28] purrr_0.3.4          xtable_1.8-4         mvtnorm_1.1-3       
[31] scales_1.2.0         processx_3.7.0       later_1.3.0         
[34] tibble_3.1.8         bayesplot_1.9.0      generics_0.1.3      
[37] farver_2.1.1         ggplot2_3.3.6        ellipsis_0.3.2      
[40] DT_0.23              shinyjs_2.1.0        cli_3.3.0           
[43] crayon_1.5.1         magrittr_2.0.3       mime_0.12           
[46] ps_1.7.1             fansi_1.0.3          nlme_3.1-157        
[49] xts_0.12.1           pkgbuild_1.3.1       colourpicker_1.1.1  
[52] prettyunits_1.1.1    tools_4.2.0          loo_2.5.1           
[55] lifecycle_1.0.1      matrixStats_0.62.0   stringr_1.4.0       
[58] munsell_0.5.0        callr_3.7.0          compiler_4.2.0      
[61] rlang_1.0.4          grid_4.2.0           ggridges_0.5.3      
[64] rstudioapi_0.13      htmlwidgets_1.5.4    crosstalk_1.2.0     
[67] igraph_1.3.1         miniUI_0.1.1.1       base64enc_0.1-3     
[70] codetools_0.2-18     gtable_0.3.0         inline_0.3.19       
[73] abind_1.4-5          DBI_1.1.2            markdown_1.1        
[76] reshape2_1.4.4       R6_2.5.1             gridExtra_2.3       
[79] rstantools_2.2.0     zoo_1.8-10           knitr_1.39          
[82] bridgesampling_1.1-2 dplyr_1.0.9          fastmap_1.1.0       
[85] utf8_1.2.2           shinythemes_1.2.0    shinystan_2.6.0     
[88] stringi_1.7.6        parallel_4.2.0       vctrs_0.4.1         
[91] tidyselect_1.1.2     xfun_0.31            coda_0.19-4

JamieHogg-depo · August 14, 2022, 9:53pm

Sorry to hear you’re having a similar problem @DominiqueMakowski. Following the recommendations in this post, I thought it best to tag @rok_cesnovar here as this post has been unanswered for 5 days. Apologies if the post is now outdated Rok.

Fabian_Crespo · October 13, 2022, 9:25am

I am having the same problem on HPC cluster with Linux. I am using cmdstanr version 0.5.3
and CmdStan version: 2.30.1

Running MCMC with 4 parallel chains...

Warning: Chain 3 finished unexpectedly!

Warning: Chain 2 finished unexpectedly!

Warning: Chain 1 finished unexpectedly!

Chain 4 Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
.......

andrjohns · October 13, 2022, 9:54am

Does this happen for all Stan models or just for a particular one? Can you run the following example model:

cmdstanr_example("logistic", method = "sample", quiet = FALSE)

Fabian_Crespo · October 13, 2022, 10:41am

Thanks for the quick reply.

I ran the example model and it finished successfully:

starting worker pid=336573 on localhost:11478 at 12:34:34.099
[1] 4
Running MCMC with 4 chains, at most 48 in parallel...

Chain 1 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 1 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 1 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 1 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 1 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 1 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 1 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 1 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 1 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 1 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 1 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 1 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 1 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 1 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 1 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 1 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 1 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 1 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 1 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 1 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 1 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 1 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 2 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 2 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 2 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 2 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 2 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 2 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 2 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 2 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 2 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 2 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 2 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 2 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 2 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 2 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 2 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 2 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 2 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 2 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 2 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 2 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 2 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 2 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 3 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 3 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 3 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 3 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 3 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 3 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 3 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 3 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 3 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 3 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 3 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 3 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 3 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 3 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 3 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 3 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 3 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 3 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 3 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 3 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 3 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 3 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 4 Iteration:    1 / 2000 [  0%]  (Warmup) 
Chain 4 Iteration:  100 / 2000 [  5%]  (Warmup) 
Chain 4 Iteration:  200 / 2000 [ 10%]  (Warmup) 
Chain 4 Iteration:  300 / 2000 [ 15%]  (Warmup) 
Chain 4 Iteration:  400 / 2000 [ 20%]  (Warmup) 
Chain 4 Iteration:  500 / 2000 [ 25%]  (Warmup) 
Chain 4 Iteration:  600 / 2000 [ 30%]  (Warmup) 
Chain 4 Iteration:  700 / 2000 [ 35%]  (Warmup) 
Chain 4 Iteration:  800 / 2000 [ 40%]  (Warmup) 
Chain 4 Iteration:  900 / 2000 [ 45%]  (Warmup) 
Chain 4 Iteration: 1000 / 2000 [ 50%]  (Warmup) 
Chain 4 Iteration: 1001 / 2000 [ 50%]  (Sampling) 
Chain 4 Iteration: 1100 / 2000 [ 55%]  (Sampling) 
Chain 4 Iteration: 1200 / 2000 [ 60%]  (Sampling) 
Chain 4 Iteration: 1300 / 2000 [ 65%]  (Sampling) 
Chain 4 Iteration: 1400 / 2000 [ 70%]  (Sampling) 
Chain 4 Iteration: 1500 / 2000 [ 75%]  (Sampling) 
Chain 4 Iteration: 1600 / 2000 [ 80%]  (Sampling) 
Chain 4 Iteration: 1700 / 2000 [ 85%]  (Sampling) 
Chain 4 Iteration: 1800 / 2000 [ 90%]  (Sampling) 
Chain 4 Iteration: 1900 / 2000 [ 95%]  (Sampling) 
Chain 4 Iteration: 2000 / 2000 [100%]  (Sampling) 
Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.
Chain 3 finished in 0.1 seconds.
Chain 4 finished in 0.1 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.5 seconds.

   variable   mean median   sd  mad     q5    q95 rhat ess_bulk ess_tail
 lp__       -65.97 -65.65 1.46 1.23 -68.80 -64.29 1.00     2112     2751
 alpha        0.38   0.38 0.22 0.22   0.03   0.73 1.00     4231     3068
 beta[1]     -0.67  -0.66 0.25 0.25  -1.08  -0.26 1.00     4380     2711
 beta[2]     -0.27  -0.27 0.22 0.22  -0.64   0.09 1.00     3819     2875
 beta[3]      0.68   0.67 0.27 0.27   0.25   1.14 1.00     3975     3173
 log_lik[1]  -0.51  -0.51 0.10 0.10  -0.69  -0.37 1.00     4178     3274
 log_lik[2]  -0.40  -0.38 0.15 0.14  -0.68  -0.20 1.00     4617     3387
 log_lik[3]  -0.50  -0.46 0.22 0.20  -0.89  -0.21 1.00     4110     3021
 log_lik[4]  -0.45  -0.43 0.15 0.14  -0.72  -0.24 1.00     3726     3085
 log_lik[5]  -1.19  -1.17 0.29 0.28  -1.68  -0.75 1.00     4578     2913

 # showing 10 of 105 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)
Error while shutting down parallel: unable to terminate some child processes

If it is something related with my model, how can I make cmdstan print the correct error message?

The same Stan model runs locally without errors.

charlesm93 · March 5, 2023, 10:48pm

Hi, it doesn’t look like this issue has been resolved. A colleague of mine, who operates on windows, recently ran into the same problem. It’d be really great to find a solution.

Tagging some people who’ve worked on cmdstanr.

@mitzimorris @jonah

charlesm93 · March 29, 2023, 12:12pm

Hi all,
@WardBrian found a solution to this problem: Guide: Diagnosing a complete lack of output on Windows

This is a temporary fix or hack, and we’re working on a permanent solution.

dark-dante · April 25, 2023, 6:59am

Hi there, it seems this issue is also seen in linux. I am using a machine with the following configurations:

OS: Ubuntu 22.04
cmdstan: “2.31.0”

christineshen · May 21, 2024, 1:19pm

I’m also encountering the same issue in linux. I’m doing runs on a computer cluster using slurm_id. Each cmdstan run is done using 10 chains. I noticed that I can’t use dopar to parallel things, otherwise the runs will randomly fail. And now my runs can finish, but if I check the log, I see this error message: Error while shutting down parallel: unable to terminate some child processes.

OS: AlmaLinux 9.3 (Shamrock Pampas Cat)
cmdstan: 2.34.1

brock · August 6, 2024, 11:40am

I’m having the same symptom. Here’s the $output() of one of the chains. I also tried using read_cmdstan_csv on the csvs mentioned, but it didn’t work. I’m on ubuntu and playing around with opencl and different BLAS so it’s quite possible that I’ve broken the C environment in some way. Just wondering how I can get a real traceback or crash report or something similar so I can diagnose.

Browse[1]> fit$output(1)

method = sample (Default)
  sample
    num_samples = 2000
    num_warmup = 1000 (Default)
    save_warmup = false (Default)
    thin = 1 (Default)
    adapt
      engaged = true (Default)
      gamma = 0.05 (Default)
      delta = 0.8 (Default)
      kappa = 0.75 (Default)
      t0 = 10 (Default)
      init_buffer = 75 (Default)
      term_buffer = 50 (Default)
      window = 25 (Default)
      save_metric = false (Default)
    algorithm = hmc (Default)
      hmc
        engine = nuts (Default)
          nuts
            max_depth = 10 (Default)
        metric = diag_e (Default)
        metric_file =  (Default)
        stepsize = 1 (Default)
        stepsize_jitter = 0 (Default)
    num_chains = 1 (Default)
id = 1 (Default)
data
  file = /tmp/RtmpLWwvH2/standata-c00524c0f8546.json
init = 2 (Default)
random
  seed = 73
output
  file = /tmp/RtmpLWwvH2/m14.9-1-202408061334-1-779ddc.csv
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  sig_figs = -1 (Default)
  profile_file = /tmp/RtmpLWwvH2/m14.9-1-profile-202408061334-1-85dbce.csv
  save_cmdstan_config = false (Default)
num_threads = 1 (Default)
opencl
  device = -1 (Default)
  platform = -1 (Default)
opencl_platform_name = NVIDIA CUDA
opencl_device_name = NVIDIA RTX A5000


Gradient evaluation took 0.002921 seconds
1000 transitions using 10 leapfrog steps per transition would take 29.21 seconds.
Adjust your expectations accordingly!


Browse[1]> read_cmdstan_csv('/tmp/RtmpLWwvH2/m14.9-1-202408061334-1-779ddc.csv')
Error: Supplied CSV file does not contain any variable names or data!
Browse[1]> read_cmdstan_csv('/tmp/RtmpLWwvH2/m14.9-1-profile-202408061334-1-85dbce.csv')
Error in read_cmdstan_csv("/tmp/RtmpLWwvH2/m14.9-1-profile-202408061334-1-85dbce.csv") : 
  Assertion on 'files' failed: File does not exist: '/tmp/RtmpLWwvH2/m14.9-1-profile-202408061334-1-85dbce.csv'.

Topic		Replies	Views
Cmdstanr unable to sample - chains are always finishing unexpectedly in Linux Interfaces cmdstanr	0	566	April 25, 2023
Getting started with CmdStanR CmdStan	3	767	October 7, 2021
Error after chains complete: "Supplied csv file is corrupt" CmdStan fitting-issues	6	1547	June 25, 2022
Stuck at Warmup iteration with no error : CmdStanR CmdStan techniques , fitting-issues	48	3156	April 21, 2020
CmdStanR returns "grep: write error" and "All variables must have the same length" CmdStan cmdstanr	17	1396	March 18, 2023

Chains finish unexpectedly in new install of CmdStanR

Related topics