Cmdstanr crashes R-session after successful sampling of model with many parameters

Note: I posted an earlier version of this with a much more complex model, but I realised this issue is completely independent of the model, so here’s a much simpler reproducible example (I deleted the previous complicated version).

The problem

cmdstanr crashes the R-session after successfully sampling from a model with many parameters. I think it’s something to do with how cmdstanr summarises or assesses the sampling. A clean R-session will also crash when trying to read the stored csv files using cmdstanr::read_cmdstan_csv() or cmdstanr::as_cmdstan_fit(). However, the same stored csv files can be read successfully with rstan rstan::read_stan_csv(), and so I’m confident that the model-fitting was successful.
This has come up in a project working with a large database of bird observations from the last 56 years: The North American Breeding Bird Survey. The models from that project work fine for bird species with ~50-60K observations, but this R-crash occurs for the more data-rich species with ~100K observations (which result in ~250K parameters). I’d like to be able to apply my model to all of the species in the database, and to stick with cmdstanr for my entire workflow, and of course I’d also like it if the R-session didn’t crash after fitting a model.

Reproducible Example

Here’s a simple reproducible example, that suggests there’s something about the number of parameters that causes the crash.
Simple linear regression model, with 250K data.

library(cmdstanr)

N = 250000

x = rnorm(N)

y = x+rnorm(N,0,0.3)

stan_data <- list(N = N,
                  y = y,
                  x = x)

mod <- "models/simple_regression.stan"
model <- cmdstan_model(mod)

The model

data {
int<lower=1> N;
vector[N] x;
vector[N] y;
}

parameters {
real a;
real b;
real<lower=0> sigma;
}

model {
sigma ~ student_t(3,0,1);
b ~ std_normal();
a ~ std_normal();

y ~ normal(a+b*x,sigma);

}

generated quantities {

vector[N] log_lik;

for(i in 1:N){
log_lik[i] = normal_lpdf(y[i] | a+b*x[i], sigma);
}

}

Crashes after fitting

This call to model$sample crashes the R-session after sampling is complete. The csv output files are stored. It takes ~10 minutes to sample, write the files, then with no errors or warnings, the R-session crashes. The crash happens in a stand-along R-session and/or RStudio.


stanfit <- model$sample(
  data=stan_data,
  refresh=200,
  chains=4, 
  iter_sampling=1000,
  iter_warmup=1000,
  parallel_chains = 4,
  output_dir = "output",
  output_basename = "simple_regression_fit")


The csv files can be read with rstan

This rstan::read_stan_csv call works, although it takes a long time to read in the files.

csv_files <- paste0("output/simple_regression_fit-",1:4,".csv")
stanfit <- rstan::read_stan_csv(csv_files, col_major = TRUE) ## successful reading of csv files with rstan

But trying to read or load the files with cmdstanr causes R-crash

Trying to read in the csv files with cmdstanr cause the R-session to crash. The crash happens quickly (a few seconds), there is no indication from the operating system of a memory issue or any other issue, and no other indication of an error. The session crashes both within a stand-alone R-session, and in RStudio.

### this as_cmdstan_fit call crashes the R-session
stanfit <- as_cmdstan_fit(files = csv_files)

### similarly, this read_cmdstan_csv call crashes the R-session
stanfit <- read_cmdstan_csv(
 files = paste0(output_dir,"/",csv_files),
 variables = "",
 sampler_diagnostics = NULL,
 format = "draws_list") # following note about efficiency in ?cmdstanr::draws


Session info

Running on a Windows computer with 16 cores and 128GB of RAM (so it’s not a question of memory, I don’t think)


utils::sessionInfo()

R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] rstan_2.21.5 ggplot2_3.3.6 StanHeaders_2.21.0-7 cmdstanr_0.5.2

loaded via a namespace (and not attached):
[1] Rcpp_1.0.8.3 pillar_1.7.0 compiler_4.2.0 prettyunits_1.1.1 tools_4.2.0 pkgbuild_1.3.1 jsonlite_1.8.0 lifecycle_1.0.1
[9] tibble_3.1.7 gtable_0.3.0 checkmate_2.1.0 pkgconfig_2.0.3 rlang_1.0.2 cli_3.3.0 DBI_1.1.3 parallel_4.2.0
[17] xfun_0.31 loo_2.5.1 gridExtra_2.3 withr_2.5.0 dplyr_1.0.9 knitr_1.39 generics_0.1.2 vctrs_0.4.1
[25] stats4_4.2.0 grid_4.2.0 tidyselect_1.1.2 inline_0.3.19 glue_1.6.2 R6_2.5.1 processx_3.6.1 fansi_1.0.3
[33] distributional_0.3.0 tensorA_0.36.2 callr_3.7.0 farver_2.1.0 purrr_0.3.4 posterior_1.2.2 magrittr_2.0.3 codetools_0.2-18
[41] matrixStats_0.62.0 ps_1.7.1 backports_1.4.1 scales_1.2.0 ellipsis_0.3.2 abind_1.4-5 assertthat_0.2.1 colorspace_2.0-3
[49] utf8_1.2.2 RcppParallel_5.1.5 munsell_0.5.0 crayon_1.5.1

Your problem is almost certainly memory. Your log_lik variable is going to generate 250K values per iteration, and Stan saves 4K iterations by default. That’s 1 billion values at 8 bytes each, for a total of 8GB of data.

On my iMac Pro (circa 2020) with 64GB of memory, this example samples in about 4m (using 4 parallel chains), then takes about the same amount of time to load the data.

@Jonah should know what the total memory overhead requirement is for RStan and whether it’s much higher than the size of the draws being loaded.

Thanks @Bob_Carpenter. Memory limits totally makes sense (the stored csv files add up to 8-10 GB), although I’m at a loss as to what limit I’m hitting. The machines I’m running on have 64 GB (or more) of memory available.
Is this a Windows thing? It sounds like the example runs fine on your iMac?
I’ve reproduced the crash on both the Windows 10 machine that I referenced in the original post, and on a Windows 11 machine.

It definitely works fine on my iMac. I don’t have a Windows machine on which to try it there.

1 Like