Stan abort, R crash, or other error when sampling relatively simple model

rstan is not happy with me. It aborts when I attempt to sample from relatively simple models. Abortions occur differently depending on how many cores or chains I request, but (at least with this data/model) no combination seems to work. One common error message is the following:

Error in unserialize(socklist[[n]]) : error reading from connection
*Error in serialize(data, node$con, xdr = FALSE) : *

  • error writing to connection*

But sometimes R simply crashes. The failure appears to occur at the start of post-warmup sampling.This behavior appeared since I upgraded my RStudio to 1.2.1335, R to 3.6.0, and rstan to 2.18.2 (running under Windows 7).

Others have experienced this too over the last few years as I see in these forums. Many of us saw it a few months ago at an R class given by J Gabry. However no consistent resolution (that I can comprehend) or even a general understanding of the behavior seems available. I’ve used Stan happily and successfully for 3+ years but I am deeply challenged by concepts like “Makevars.win”.

I’m hoping some kind person with greater understanding might look at the following code and model and give me insight to avoid these abortions. I apologize if I’ve not put this in the right Forum category or improperly formatted this note. I very much appreciate any suggestions!! Thank you. :-)

Here is my R code:
rm (list = ls (all = TRUE)) # caution: removes everything
library (rstan)
rstan_options (auto_write = TRUE)
options (mc.cores = parallel::detectCores ())

dataList ← list(
nobs = 92,
nbatch = 11,
batch= c(10,10,10,10,10,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,
3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,
6,6,6,6,7,7,7,7,8,8,8,8,9,9,10,10,10,10,10,11,11,11),
X = c(0,1,2,3,6,52,52,53,35,35,35,35,35,35,35,35,35,35,35,35,35,35,36,36,36,36,36,7,7,7,7,7,7,7,7,
7,7,7,7,8,8,8,8,24,36,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,35,35,35,36,36,36,36,36,37,
22,22,22,36,22,22,22,36,21,22,22,36,0,12,0,1,2,3,6,68,68,69),
Y = c(95.0,104.0,111.0,103.0,97.0,92.8,104.6,104.7,107.8,101.4,93.9,101.3,93.7,97.1,99.8,102.1,102.1,
106.4,100.3,99.2,99.6,102.5,108.6,99.2,98.5,107.0,96.3,108.1,123.5,114.6,123.0,107.6,118.8,114.3,
116.4,120.1,124.3,103.7,106.2,131.3,119.7,114.0,116.4,117.0,110.0,110.0,115.8,103.9,105.2,101.3,102.7,
106.1,97.5,99.5,102.7,107.8,106.9,108.6,98.2,115.1,105.1,83.0,86.4,85.1,89.3,85.0,89.3,84.7,
92.1,76.8,100.3,110.9,94.7,97.0,92.5,88.7,86.7,86.0,85.9,86.4,78.2,81.0,106.0,100.0,95.0,
98.0,110.0,103.0,105.0,79.8,94.0,84.2),
R = c(110,117,94,90,98,100,93,105,113,103,111,112,111,121),
A = c(1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
nrel = 14)

comp_model ← stan_model(‘Stan model.stan’)

print(comp_model)

StanFit ← sampling(comp_model, data=dataList, iter=2000, chains=4)

Here is my ‘Stan model.stan’ file (which btw I have used successfully in other analyses):
data {
int<lower=1> nobs; //number of stability data points
int<lower=1> nrel; //number of release values
real Y[nobs]; //measured stability responses
real R[nrel]; //measured release responses
real X[nobs]; //storage time in months
int<lower=1> nbatch; //number of batches on stability
int<lower=1, upper=nbatch> batch[nobs]; //stability batch id
real A[nobs];
}

parameters {
vector[2] beta; //intercept and slope
real<lower=0> sigma; //error sd
vector<lower=0>[2] sigma_u; //batch intercept-slope sds
cholesky_factor_corr[2] L_u;
matrix[2,nbatch] z_u;
real delta;
}

transformed parameters{
matrix[2,nbatch] u; // stability batch effects
matrix[2,2] C;
u = diag_pre_multiply(sigma_u, L_u) * z_u; //batch random effects
C = L_u*L_u’; // correlation matrix

}

model {
real mu;

//priors
L_u ~ lkj_corr_cholesky(2.0);
to_vector(z_u) ~ normal(0,1);
//likelihood
for (i in 1:nobs){
mu = beta[1] + u[1,batch[i]] + (beta[2] + delta*A[i] + u[2,batch[i]]) * X[i];
Y[i] ~ normal(mu, sigma);
}

for (i in 1:nrel){
R[i] ~ normal(beta[1], sqrt(sigma^2 + sigma_u[1]^2));
}
}

Below is the console output (some unimportant stuff removed):
R version 3.6.0 (2019-04-26) – “Planting of a Tree”
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

rm (list = ls (all = TRUE)) # caution: removes everything
#options(max.print=100000)
library (rstan)
Loading required package: ggplot2
Registered S3 methods overwritten by ‘ggplot2’:
method from
[.quosures rlang
c.quosures rlang
print.quosures rlang
Loading required package: StanHeaders
rstan (Version 2.18.2, GitRev: 2e1f913d3ca3)
For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores()).
To avoid recompilation of unchanged Stan programs, we recommend calling
rstan_options(auto_write = TRUE)
For improved execution time, we recommend calling
Sys.setenv(LOCAL_CPPFLAGS = ‘-march=native’)
although this causes Stan to throw an error on a few processors.

comp_model ← stan_model(‘Stan model.stan’)
recompiling to avoid crashing R session

StanFit ← sampling(comp_model, data=dataList, iter=2000, chains=4)
Error in unserialize(socklist[[n]]) : error reading from connection
Error in serialize(data, node$con, xdr = FALSE) :
error writing to connection

I would try a clean R sessions without the

Sys.setenv(LOCAL_CPPFLAGS = '-march=native')

and then run your model with chains = 1 to hopefully see a better error message.

Thanks Ben. I will definitely do this but please be patient with me. I am not familiar with the "
Sys.setenv(LOCAL_CPPFLAGS = ‘-march=native’)" command. It is not something I used in my R code or Stan code. So I have no clue how to implement your suggestion. Would you please share some key details so I can can give this a try? :-)

You already had that line in your post; I’m just suggesting to take it out.

Yes I see. That was an advisory message produced when rstan was loaded. I NEVER follow that advice because of the scary “although this causes Stan to throw an error on a few processors”.

So I followed the second part of your advice (first part apparently not needed) and I can report that the output starts going to the console (instead of the viewer) and then … “R Session aborted” (bomb appears etc). Nothing informative.

Meanwhile I have tried simplifying my data/model. When I finally get it down to a simple linear regression (removing some covariables and the call to lkj_corr_cholesky) it runs OK*. Do you think this abortion could in part be due to a mismatch between model and data (the error was reading/writing to a connection)? And why would the error only appear post-warmup (the warmup seems to go fine - the instant sampling starts - crash)? Any thoughts appreciated

*I still was not able to get the model to run as desired. However I note that the data set is unbalanced. When I run a simulated, balanced data set, the desired model runs perfectly. Therefore I am concluding the error occurs because the data is inadequate to estimate the model and Stan is burping. However I am surprised that the error message implies there is a read/write/connection error. Could an inability to properly sample result in a read/write/connection error (we would hope for relevant error feedback)? And why was warmup OK and the error only occurred in the sampling phase?? Would love to know the answers.

So I can confirm that the error was due to a “mismatch” between data and model. The model was essentially a random slope random intercept model (Gelman & Hill, section 11.1) but one group only had 2 time points (no df for error). When that group was omitted, rstan ran fine. So my bad, though not obvious that would be a problem. But perhaps this experience will help someone else. The error message is not very helpful because it suggests there is something wrong with the installation (read/write/connection error) when it could be an indication of model/data mis-match (error of 3rd kind). So much for a day’s work. :-)