Segfault when executing stan() with rstan on Linux computing cluster

Operating System: CentOS Linux release 7.4.1708 (Core)
Interface Version: R version 3.5.1, Rstudio 1.1.456
Compiler/Toolkit: gcc/7.2.0

Dear forum,

I’m executing code on a Linux computing cluster. Installing and loading rstan works without problems. However, when executing a stan() command:
f1 <- stan(model_code = mstr, data=dList, iter=10, init=0,
chains=1, seed=seed, chain_id=1, pars=paramInit)
(just for test purposes with 10 iterations), rstudio is aborted after ~30 seconds, giving the following errors:

[5771899.545016] traps: rsession[7206] trap invalid opcode ip:7f15b5b6ab3a sp:7fffe14f5430 error:0 in rstan.so[7f15b59f5000+361000]
[5771916.238232] traps: rsession[24955] trap invalid opcode ip:7f1fce4cab3a sp:7ffd2bd167e0 error:0 in rstan.so[7f1fce355000+361000]
[5771939.609177] traps: rsession[26429] trap invalid opcode ip:7f287b94db3a sp:7ffdeac7c3d0 error:0 in rstan.so[7f287b7d8000+361000]
[5772252.142301] traps: rsession[32637] trap invalid opcode ip:7fe3e3efcb3a sp:7ffc7bbeba50 error:0 in rstan.so[7fe3e3d87000+361000]
[5772342.611217] traps: rsession[23741] trap invalid opcode ip:7f847d1afb3a sp:7ffc20b54b50 error:0 in rstan.so[7f847d03a000+361000]
[5772475.612125] traps: rsession[31746] trap invalid opcode ip:7f8d58430b3a sp:7ffca83cf780 error:0 in rstan.so[7f8d582bb000+361000]
[5772497.630892] traps: rsession[12279] trap invalid opcode ip:7f37ea75db3a sp:7ffc7128bf40 error:0 in rstan.so[7f37ea5e8000+361000]
[5772861.781192] traps: rsession[13520] trap invalid opcode ip:7f557e4c9b3a sp:7fff03b34e40 error:0 in rstan.so[7f557e354000+361000]
[5774646.314608] rsession[29295]: segfault at 20 ip 00007fbb1ed8d560 sp 00007fff52159250 error 4 in Rcpp.so[7fbb1ed41000+5c000]
[5774891.787188] rsession[15764]: segfault at 20 ip 00007f91823bf560 sp 00007ffc72b800a0 error 4 in Rcpp.so[7f9182373000+5c000]
[5775072.730078] rsession[6620]: segfault at 20 ip 00007fb964448560 sp 00007ffdfdddcfa0 error 4 in Rcpp.so[7fb9643fc000+5c000]

The very same code has worked in the past when executed on a local (Windows) machine, so that shouldn’t be the problem. Also, I allocate 30 GB to the job, with the coding having worked properly with < 10 GB in the past.
The IT support of my institute suggested to post this question on this forum, as it might relate to memory access or segmentation problems when rstan compiles the model to stan…?

Thanks for any help!
Johannes

I’m not sure, but my suspicion is that your login nodes and your compute nodes have slightly different CPU architectures and what compiles on one will often not run on the other. (I’ve seen a few clusters with this annoying misfeature.)

When you say “Installing and loading rstan works without problems”, are you testing this on a login node or a compute node? What happens if you run a test job with an R file that only has the following contents?:

library(rstan)

Basically, I want to see if RStan will die just when loading on a compute node.

Also, can you run RStan from a login node? (If running your real model on a login node isn’t viable, maybe you can try the Eight Schools example model.)

ETA: Did you install RStan from a login node or a compute node?

Hi James,

thanks for your reply—we’ve got rstan installed/ loaded both on the login and the computing node (there was a problem with the installation first where we came across that distinction), and just running a job with library(rstan) works.

Running the eight schools example on the login node via the console gives the following output:

library(rstan)
Loading required package: StanHeaders
Loading required package: ggplot2
rstan (Version 2.19.2, GitRev: 2e1f913d3ca3)
For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores()).
To avoid recompilation of unchanged Stan programs, we recommend calling
rstan_options(auto_write = TRUE)
schools_dat <- list(J = 8, 
+                     y = c(28,  8, -3,  7, -1,  1, 18, 12),
+                     sigma = c(15, 10, 16, 11,  9, 11, 10, 18))
fit <- stan(file = '8schools.stan', data = schools_dat)

 *** caught segfault ***
address 0x20, cause 'memory not mapped'

Traceback:
 1: Module(module, mustStart = TRUE)
 2: .getModulePointer(x)
 3: new("Module", .xData = <environment>)$stan_fit4model31fc1fcddda3_8schools
 4: new("Module", .xData = <environment>)$stan_fit4model31fc1fcddda3_8schools
 5: eval(call("$", mod, paste("stan_fit4", model_cppname, sep = "")))
 6: eval(call("$", mod, paste("stan_fit4", model_cppname, sep = "")))
 7: object@mk_cppmodule(object)
 8: .local(object, ...)
 9: sampling(sm, data, pars, chains, iter, warmup, thin, seed, init,     check_data = TRUE, sample_file = sample_file, diagnostic_file = diagnostic_file,     verbose = verbose, algorithm = match.arg(algorithm), control = control,     check_unknown_args = FALSE, cores = cores, open_progress = open_progress,     include = include, ...)
10: sampling(sm, data, pars, chains, iter, warmup, thin, seed, init,     check_data = TRUE, sample_file = sample_file, diagnostic_file = diagnostic_file,     verbose = verbose, algorithm = match.arg(algorithm), control = control,     check_unknown_args = FALSE, cores = cores, open_progress = open_progress,     include = include, ...)
11: stan(file = "8schools.stan", data = schools_dat)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

Any idea what’s happening?

This is usually due to Rcpp being compiled with a different compiler or set of compiler flags than RStan.

Thanks indeed, that was it! Reinstalling Rcpp (RcppEigen) and rstan made it work in the end.