R session aborts when using the 'control' argument

I have been successfully running stan models while specifying control arguments (e.g. adapt_delta, max_treedepth) without issue until yesterday. The only thing I can think of that could have affected it is that I installed the mixOmics package via Bioconductor using the steps outlined here: GitHub - mixOmicsTeam/mixOmics: Development repository for the Bioconductor package 'mixOmics '

After this installation, whenever I run a model where I specify the β€˜control’ argument, I get an error via an Rstudio popup:
Screen Shot 2021-07-16 at 12.58.37 PM

If I run chains in parallel using

options(mc.cores = detectCores())

Then the chains start sampling and I see the following output and error:

starting worker pid=25403 on localhost:11771 at 13:08:20.600
starting worker pid=25417 on localhost:11771 at 13:08:20.798

SAMPLING FOR MODEL 'simple-model' NOW (CHAIN 1).
Error in unserialize(socklist[[n]]) : error reading from connection

SAMPLING FOR MODEL 'simple-model' NOW (CHAIN 2).

These both happen with the stan() function and the sampling() function.

I have a 2017 Macbook Pro running Big Sur (11.4). Here is my session info after loading the β€˜stan’ and β€˜parallel’ libraries:

R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstan_2.21.2         ggplot2_3.3.5        StanHeaders_2.21.0-7

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7         pillar_1.6.1       compiler_4.1.0     prettyunits_1.1.1 
 [5] tools_4.1.0        pkgbuild_1.2.0     jsonlite_1.7.2     lifecycle_1.0.0   
 [9] tibble_3.1.2       gtable_0.3.0       pkgconfig_2.0.3    rlang_0.4.11      
[13] DBI_1.1.1          cli_3.0.0          curl_4.3.2         loo_2.4.1         
[17] gridExtra_2.3      withr_2.4.2        dplyr_1.0.7        generics_0.1.0    
[21] vctrs_0.3.8        stats4_4.1.0       grid_4.1.0         tidyselect_1.1.1  
[25] glue_1.4.2         inline_0.3.19      R6_2.5.0           processx_3.5.2    
[29] fansi_0.5.0        purrr_0.3.4        callr_3.7.0        magrittr_2.0.1    
[33] codetools_0.2-18   matrixStats_0.59.0 scales_1.1.1       ps_1.6.0          
[37] ellipsis_0.3.2     assertthat_0.2.1   colorspace_2.0-2   V8_3.4.2          
[41] utf8_1.2.1         RcppParallel_5.1.4 munsell_0.5.0      crayon_1.4.1   

I uninstalled R and Rstudio and stan and reinstalled them all (making sure to delete all libraries, especially all of the ones that were installed when I installed mixOmics), and shut down and restarted my computer numerous times. The problem persists.

I can fit models perfectly without specifying and control argumentsβ€”I can specify β€˜chians’, β€˜iter’, β€˜thin’, etc. and it works, both parallel and serial.

In case it matters, my Makevars file is the following:

CXX14FLAGS += -O3 -mtune=native -arch x86_64 -ftemplate-depth-256

Here is a minimal reproducible example for what’s going on. My stan code is:

data {
  int<lower=0> N;
  vector[N] y;
}

parameters {
  real mu;
  real<lower=0> sigma;
}

model {
  y ~ normal(mu, sigma);
}

My R code is:

library(rstan); library(parallel);

mu <- 1
sigma <- 1
N <- 100
y <- rnorm(N, mu, sigma)
dat_list <- list(N = N, y = y)

# this works!
res <- stan(file = "/Users/austin/Desktop/simple-model.stan",
            data = dat_list)

# this has an error and aborts R
res <- stan(file = "/Users/austin/Desktop/simple-model.stan",
            data = dat_list, 
            control = list(adapt_delta = 0.95))

# for parallel
options(mc.cores = detectCores())

# this works!
res <- stan(file = "/Users/austin/Desktop/simple-model.stan",
            data = dat_list)

# this has an error: "Error in unserialize(socklist[[n]]) : error reading from connection"
res <- stan(file = "/Users/austin/Desktop/simple-model.stan",
            data = dat_list, 
            control = list(adapt_delta = 0.95))

I’m thoroughly baffled by this, especially due to the full reinstallation of R, Rstudio, and Stan. Any ideas and help would be gladly appreciated.

1 Like

I can confirm this on my laptop with OS X. However, I usually don’t run rstan anymore but use cmdstan instead:

m0 <- cmdstan_model("~/Downloads/simple-model.stan")
res <- m0$sample(dat_list, adapt_delta=0.95)

works great.

2 Likes

Does everything run ok if you run R from Terminal? If so, then this is a good reproducible example to use in an issue at the RStudio GitHub repo.

Correction. This is what I get. RStudio does not crash for me.

@torkar Thanks for the help! I successfully ran the model with the adapt_delta argument using cmdstan. I’d never used cmdstan before so I downloaded the cmdstanr R package and installed cmdstan fresh via the install_cmdstan() function. So at least now I can run models using cmdstan. Regular stan still doesn’t work.

@mike-lawrence When I ran stan in the Terminal, it tried to start sampling but then had an error:

SAMPLING FOR MODEL 'simple-model' NOW (CHAIN 1).
Segmentation fault: 11

So I guess it’s not an RStudio issue but instead something else…

To resurrect this thread again, I am getting the same error when I try to specify max_treedepth in brms:

Compiling Stan program...
Start sampling
starting worker pid=3350 on localhost:11512 at 15:25:13.701
starting worker pid=3364 on localhost:11512 at 15:25:14.160

SAMPLING FOR MODEL 'xxxx' NOW (CHAIN 1).

SAMPLING FOR MODEL 'xxxx' NOW (CHAIN 2).
Error in unserialize(socklist[[n]]) : error reading from connection

I am also on a Mac, with the Catalina OS. Most of the time I get this error message, but I have also had RStudio crash completely once.

Try to see what error (if any) you get if you set the number of cores to 1.

@mcol thanks for the reply-- I have already tried this and that was the time that RStudio crashed.

In addition, I start my session by setting

options(mc.cores = parallel::detectCores())

… this should automatically set the number of cores based on my machine, correct?

Yes, that will do it (or you can use the cores argument of brm()). Do the failures only happen if you set the max_treedepth argument? Have you tried running the model using R from the terminal rather than with Rstudio? This last point is quite important to define if the problem is somewhere in the Stan framework, or rather in Rstudio itself (which is often implicated in such hard to explain crashes).

Running on the command line produces:

Compiling Stan program...
Start sampling

SAMPLING FOR MODEL 'xxxx' NOW (CHAIN 2).

SAMPLING FOR MODEL 'xxxx' NOW (CHAIN 1).
Error in FUN(X[[i]], ...) :
  trying to get slot "mode" from an object of a basic class ("NULL") with no slots
Calls: brm ... eval -> .fun -> .fun -> .local -> sapply -> lapply -> FUN
In addition: Warning message:
In mccollect(jobs) : 2 parallel jobs did not deliver results
Execution halted

I tried adding back cores = 1 in the brms call and running on command line, this time it gives the same error as Austin above.

Compiling Stan program...
Start sampling

SAMPLING FOR MODEL 'xxxx' NOW (CHAIN 1).
Segmentation fault: 11

If I change adapt_delta instead of max_treedepth with the control() argument, I get the same errors, whether in RStudio or on the command line. The model runs without the control() argument, only issuing a warning (which is what prompted me to change the tree depth in the first place).

My environment:

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

other attached packages:
 [1] ggExtra_0.9     tidybayes_3.0.0 forcats_0.5.1   stringr_1.4.0   dplyr_1.0.7     purrr_0.3.4     readr_2.0.0     tidyr_1.1.3     tibble_3.1.3    ggplot2_3.3.5   tidyverse_1.3.1
[12] brms_2.15.0     Rcpp_1.0.7 

Let’s see if @paul.buerkner has further suggestions on how to debug this. It will probably help him to see also the call to brm() you used.

Thank you! Model looks like:

modelRecall <- brm(formula = recalled ~ 1 +
                                         poly(position, 2) +
                                         (1 | logFreq) + (1 | propDensity) + 
                                         (1 + poly(position, 2) | ID),  
                            data=wordData, 
                            family = bernoulli(link = "logit"),
                            prior = c(set_prior("normal(0.567, 0.186)", class = "Intercept")),
                            warmup = 500, 
                            iter = 2000, 
                            chains = 2, 
                            inits= "0",
                            seed = 150,
                            control = list(max_treedepth = 11),
                            file = "./modelcache/modelRecall")

(I should note that I have run this model on another two datasets with the same results; it really does seem to be the control statement that causes the crash, even when a model with the default tree depth produces no warnings)

Hi again,

Just thought I’d note that I am getting a similar set of errors while using the loo() function.

In one of my models I have one pareto_k value above 0.7, and it suggests using moment_match = TRUE. When I do this, the R session aborts. Using reloo() results in:

Fitting model 1 out of 1 (leaving out observation 193)
Start sampling
Error in unserialize(socklist[[n]]) : error reading from connection

I’ve tried upgrading loo to the latest GitHub version, which does not change this behavior.
Setting chains = 1 within reloo() also causes R to abort.

Running on command line with moment_match = TRUE:

R(21273,0x1081aadc0) malloc: *** error for object 0x7ffee8d4e790: pointer being freed was not allocated
R(21273,0x1081aadc0) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6

Running on command line with reloo():

The model will be refit 1 times.

Fitting model 1 out of 1 (leaving out observation 982)
Start sampling
Error in FUN(X[[i]], ...) :
  trying to get slot "mode" from an object of a basic class ("NULL") with no slots
Calls: reloo ... reloo.brmsfit -> <Anonymous> -> value.Future -> signalConditions
Execution halted

Do you think they are related somehow? The command line errors are different, but both of them are crashing RStudio after adding extra specifications or changing the number of chains/cores.

Other useful information on my R session, maybe there is something in here:

Loading required package: Rcpp
Loading 'brms' package (version 2.15.0). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').

Attaching package: β€˜brms’

The following object is masked from β€˜package:stats’:

    ar

── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
βœ” ggplot2 3.3.5     βœ” purrr   0.3.4
βœ” tibble  3.1.3     βœ” dplyr   1.0.7
βœ” tidyr   1.1.3     βœ” stringr 1.4.0
βœ” readr   2.0.0     βœ” forcats 0.5.1
── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
βœ– dplyr::filter() masks stats::filter()
βœ– dplyr::lag()    masks stats::lag()

Try to upgrade brms and (if needed) other packages to their latest version.

Thanks for the suggestion-- I had only updated the loo package before. I’ve now updated all the packages, and even reinstalled R/Rstudio. Unfortunately, it’s exactly the same errors. I can use loo normally and specify save_psis = TRUE, it’s just adding moment_match = TRUE or using reloo that results in a crash.

I see quite many posts about moment_match in this forum. Have you glanced through the replies? If yes, then perhaps @scholz would know what the problem might be?

@aisa2 have you tried using cmdstan as it seems to have fixed the problem for Austin?

1 Like

I was trying to avoid having to install cmdstan, so I had not tried this yet I even after seeing Austin’s post-- but it seems that there may be no other solusion. After wrangling it for some time, I did get it to work!