Reading this thread speedup-by-using-external-blas-lapack-with-cmdstan I was reminded, that I wanted to make sure we had the right arguments set for the compiler of our RStan code.
We run a very significant amount of stan models so every little bit counts (more than 10k hours per month currently I estimate)
I set the compiler flags in Makeconf following Configuring-C-Toolchain-for-Linux. That is, I set:
CXX14FLAGS=-O3 -march=native -mtune=native -fPIC"
CXX14=g++
My finding was that the 8 schools model worked, but on our own models it did not work. Stan and R would crash when sampling started with the error
SAMPLING FOR MODEL '838f06335e6a3b7704453ca29ed6ed1b' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 1.1e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
double free or corruption (out)
Aborted
Digging quite a bit into it, I realised that it is related to AVX instructions. That is. If I set arch and tune to ‘westmere’, which is the last generation without AVX, our models works. If I additionally added -mavx to enable AVX, they crash.
As shown below I find this problem on R 4.1 and above but not on R 4.0.5 and below. I have only tested on Rstan version 2.21.2 ( GitRev: 2e1f913d3ca3). However, as I change docker between those test, other packages may change.
All tests are on Ubuntu 20.04.3 LTS (Focal Fossa) on an Intel CPU with “Kaby Lake” architecture.
I will greatly appreciate any help in figuring out how to fix this issue.
Reproduce the error
As I cannot share our internal models, I have worked on reproducing the behaviour in a toy-model. These are the steps:
Using Docker open a R Docker from the rocker project:
docker run -it rocker/r-ver:4.1.2 bash
Install V8:
apt-get update
apt-get install libv8-dev -y
Put the compiler flags into Makeconf
echo "CXX14FLAGS=-O3 -march=native -mtune=native" >> /usr/local/lib/R/etc/Makeconf
echo "CXX14=g++" >> /usr/local/lib/R/etc/Makeconf
Open R and install Rstan
install.packages("rstan")
Run the following R code to get the error. This Stan code has a vector matrix multiplication, which is important.
library(rstan)
stan_code <- "
data {
matrix[3,3] M;
vector[3] y;
}
parameters {
vector[3] beta;
}
model {
beta ~ normal(4, 1);
y ~ normal(M * beta, 1);
}
"
dat <- list(M = matrix(c(5,4,8,3,9,1,4,2,6), nrow = 3),
y=c(2.5, 4.2, 2.))
fit <- stan(model_code=stan_code, data = dat)
Here are my findings:
The model will crash with:
-march=native -mtune=native
and with -march=westmere -mtune=westmere -mavx
but will work fine with
-march=westmere -mtune=westmere
The behaviour is observed in rocker/r-ver:4.1
and rocker/r-ver:4.1.2
. There is no issue in rocker/r-ver:4.0.4
and rocker/r-ver:4.0.5
.
(Ps. I am not repoting this as a bug, as I believe the problem is outside RStan. But I am stuck in figuring out where to look.)