Thanks a lot for this tutorial !
I’ve tried it with OpenBLAS (through CmdStanR), but I’m having some issues:
- Compilation fails due to not finding -llapacke (error:
/bin/ld: cannot find -llapacke
) - If I remove this argument, it compiles successfully and my test model runs, but very slowly: I get ~4x slower sampling without within chain parallelization, and ~6x slower with parallelization.
Some potentially relevant info:
- CmdStan 2.28.2
- I’m using WSL2 on W11 (Ubuntu 20.04.3 LTS - GNU/Linux 5.10.60.1-microsoft-standard-WSL2 x86_64)
- CPU is Ryzen 5950x
- Other cpp_options I use are:
list(STAN_THREADS = TRUE, PRECOMPILED_HEADERS = TRUE, STAN_CPP_OPTIMS = TRUE)
Disclaimer: Total noob at BLAS stuff, I have no idea what I’m doing.
Edit:
- Model is a simple Bernouilli GLM with 10 rnorm() predictors, and I use
brms
to generate the stan code and data. - OpenBLAS was installed with
sudo apt-get install libopenblas-dev
- Both BLAS and LAPACK are set to OpenBLAS (
/usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
and/usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
) - I have not changed the default OPENBLAS_NUM_THREADS (I have no idea what it’s doing)