Cmdstanr cannot allocate vector of size 2.0 Gb

Looks like cmdstanr has trouble reading some large csvs while rstan doesn’t

> fit.cmdstanr <- cmdstanr::as_cmdstan_fit(dir(pattern="*.csv", full.names=TRUE))
Error: cannot allocate vector of size 2.0 Gb
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
> fit.rstan <- rstan::read_stan_csv(dir(pattern="*.csv", full.names=TRUE))                                   
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /data/R/lib/R/lib/
LAPACK: /data/R/lib/R/lib/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6           pillar_1.4.7         compiler_4.0.3      
 [4] cmdstanr_0.3.0.9000  prettyunits_1.1.1    tools_4.0.3         
 [7] pkgbuild_1.2.0       jsonlite_1.7.2       lifecycle_0.2.0     
[10] tibble_3.0.5         checkmate_2.0.0      gtable_0.3.0        
[13] pkgconfig_2.0.3      rlang_0.4.10         DBI_1.1.1           
[16] cli_2.2.0            parallel_4.0.3       curl_4.3            
[19] xfun_0.20            loo_2.4.1            gridExtra_2.3       
[22] dplyr_1.0.3          withr_2.4.0          knitr_1.30          
[25] generics_0.1.0       vctrs_0.3.6          tidyselect_1.1.0    
[28] stats4_4.0.3         grid_4.0.3           glue_1.4.2          
[31] inline_0.3.17        data.table_1.13.6    R6_2.5.0            
[34] processx_3.4.5       fansi_0.4.2          rstan_2.21.2        
[37] purrr_0.3.4          callr_3.5.1          ggplot2_3.3.3       
[40] posterior_0.1.3      magrittr_2.0.1       codetools_0.2-16    
[43] matrixStats_0.58.0   backports_1.2.1      scales_1.1.1        
[46] ps_1.5.0             ellipsis_0.3.1       StanHeaders_2.21.0-7
[49] assertthat_0.2.1     abind_1.4-5          colorspace_2.0-0    
[52] V8_3.4.0             munsell_0.5.0        RcppParallel_5.0.2  
[55] crayon_1.3.4    

It may have something to do with fread, as I’m getting the following when try to read the same files after the above error

> fit.cmdstanr <- cmdstanr::as_cmdstan_fit(dir(pattern="*.csv", full.names=TRUE))
Error in data.table::fread(cmd = fread_cmd, colClasses = "character",  : 
  File '/tmp/RtmpN1k88I/file69065a40c474' does not exist or is non-readable. getwd()=='/data/Torsten/example-models/effCpt'

Unfortunately I cannot disclose the .csv files.

@jonah @rok_cesnovar


Thanks @yizhang for reporting.

Are you able to read it in with

fread_cmd <- paste0("grep -v '^#' --color=never ", output_file)
 draws <- data.table::fread(
          cmd = fread_cmd,
          data.table = FALSE

Put your csv filepath in output_file.

Looks like fread works here:

> fread_cmd <- paste0("grep -v '^#' --color=never ", "*.csv")
> draws <- data.table::fread(
+           cmd = fread_cmd,
+           data.table = FALSE
+  )
> object.size(draws)
16192859000 bytes

@rok_cesnovar having noticed that in the previous post the draws consumes a lot memory(>16GB), I went back to use a machine with bigger memory(32GB) and cmdstanr works just fine. So I guess I was running out of memory, and the real question should be how we can reduce mem footprint of cmdstanr: the total size of CSVs is only ~2GB << 16GB.


Thanks for that info.

Will investigate the memory footprint, I guess read_cmdstan_csv() and the draws() function are the two functions to investigate, but mostly the first one. Made an issue Investigate memory consumption on reading in CSV · Issue #445 · stan-dev/cmdstanr · GitHub

1 Like