Stan compile with Apple Accelerate BLAS / LAPACK?

Is there a way to have Stan use Apple’s versions of BLAS / LAPACK in its accelerate framework on their Apple silicon machines?

Using BLAS/LAPACK from Eigen discussing using Apple Accelerate on the backend, and Eigen: AccelerateSupport module seem useful.

I tried using macports to install lapacke per the above linked instructions, and then updating make using cmdstanr by modifying what I found in this previous Stan post:

cmdstan_make_local(cpp_options=list(
  STAN_THREADS=TRUE, STAN_NO_RANGE_CHECKS=TRUE,
  LDLIBS = "-lblas -llapack -llapacke",
  CXXFLAGS = "-mcpu=native -DEIGEN_USE_BLAS -DEIGEN_USE_LAPACKE -framework Accelerate /opt/local/lib/lapack/liblapacke.dylib",
  CXXFLAGS_OPTIM="-mcpu=native", 
  CXXFLAGS_OPTIM_TBB="-mcpu=native",
  CXXFLAGS_OPTIM_SUNDIALS="-mcpu=native"
),
append=FALSE)

rebuild_cmdstan(cores = 8)

But the build failed with the error:

ld: library not found for -llapacke
clang: error: linker command failed with exit code 1 (use -v to see invocation)

The error is in the below context,

clang++ -E -x c++ ../tbb_2020.3/src/tbbmalloc/mac64-tbbmalloc-export.def -O2 -DUSE_PTHREAD  -stdlib=libc++ -arch arm64 -mmacosx-version-min=10.11  -Wall -Wno-unknown-warning-option -Wno-deprecated-copy -mcpu=native  -DTBB_SUPPRESS_DEPRECATED_MESSAGES=1  -fno-rtti -fno-exceptions -D__TBBMALLOC_BUILD=1  -Wno-non-virtual-dtor -Wno-dangling-else -I../tbb_2020.3/src -I../tbb_2020.3/src/rml/include -I../tbb_2020.3/include > tbbmalloc.def
clang++ -mcpu=native -DEIGEN_USE_BLAS -DEIGEN_USE_LAPACKE -framework Accelerate /opt/local/lib/lapack/liblapacke.dylib -std=c++1y -Wno-unknown-warning-option -Wno-tautological-compare -Wno-sign-compare -D_REENTRANT -Wno-ignored-attributes     -DSTAN_THREADS -I stan/lib/stan_math/lib/tbb_2020.3/include  -mcpu=native -DSTAN_NO_RANGE_CHECKS -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.4.0 -I stan/lib/stan_math/lib/boost_1.78.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials    -DBOOST_DISABLE_ASSERTS        -DSTAN_NO_RANGE_CHECKS       -Wl,-L,"/Users/ssp3nc3r/.cmdstan/cmdstan-2.32.2/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/Users/ssp3nc3r/.cmdstan/cmdstan-2.32.2/stan/lib/stan_math/lib/tbb"        bin/cmdstan/stansummary.o -lblas -llapack -llapacke       -Wl,-L,"/Users/ssp3nc3r/.cmdstan/cmdstan-2.32.2/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/Users/ssp3nc3r/.cmdstan/cmdstan-2.32.2/stan/lib/stan_math/lib/tbb"     -o bin/stansummary
clang++ -c -MMD -O2 -DUSE_PTHREAD  -stdlib=libc++ -arch arm64 -mmacosx-version-min=10.11  -Wall -Wno-unknown-warning-option -Wno-deprecated-copy -mcpu=native  -DTBB_SUPPRESS_DEPRECATED_MESSAGES=1  -Wno-non-virtual-dtor -Wno-dangling-else -fPIC  -D__TBBMALLOC_BUILD=1 -I../tbb_2020.3/src -I../tbb_2020.3/src/rml/include -I../tbb_2020.3/include -I../tbb_2020.3/src/tbbmalloc -I../tbb_2020.3/src/tbbmalloc ../tbb_2020.3/src/tbbmalloc/proxy.cpp
ld: library not found for -llapacke
clang: error: linker command failed with exit code 1 (use -v to see invocation)
clang++ -c -MMD -O2 -DUSE_PTHREAD  -stdlib=libc++ -arch arm64 -mmacosx-version-min=10.11  -Wall -Wno-unknown-warning-option -Wno-deprecated-copy -mcpu=native  -DTBB_SUPPRESS_DEPRECATED_MESSAGES=1  -Wno-non-virtual-dtor -Wno-dangling-else -fPIC  -D__TBBMALLOC_BUILD=1 -I../tbb_2020.3/src -I../tbb_2020.3/src/rml/include -I../tbb_2020.3/include -I../tbb_2020.3/src/tbbmalloc -I../tbb_2020.3/src/tbbmalloc ../tbb_2020.3/src/tbbmalloc/tbb_function_replacement.cpp
make: *** [bin/stansummary] Error 1
make: *** Waiting for unfinished jobs....

PS> Using the Apple Accelerate in R provides substantial speedups, which you can activate like so,

cd /Library/Frameworks/R.framework/Resources/lib/

ln -s -i -v libRblas.vecLib.dylib libRblas.dylib
2 Likes

Aki has written some about this for linux, maybe there is some useful things to take from the thread: Speedup by using external BLAS/LAPACK with CmdStan and CmdStanR/Py

2 Likes

Thanks for pulling together these links and giving this a try, @ssp3nc3r. Did you ever get any further with this? I’m about to try it myself and haven’t found any other resources.

No, I had other things come up and haven’t got back to it yet. Will be interested in anyone making progress though.

You were close. This works for me:

STAN_THREADS=TRUE
STAN_NO_RANGE_CHECKS=TRUE
LDLIBS = -lblas -llapack -llapacke
CXXFLAGS += -mcpu=native -DEIGEN_USE_BLAS -DEIGEN_USE_LAPACKE
LDFLAGS += -framework Accelerate -L/opt/homebrew/opt/lapack/lib
CXXFLAGS_OPTIM=-mcpu=native
CXXFLAGS_OPTIM_TBB=-mcpu=native
CXXFLAGS_OPTIM_SUNDIALS=-mcpu=native

Note that this uses the homebrew lapacke libraries which you get with brew install lapack. The settings you posted earlier were tailored for macports (as suggested by Eigen, but homebrew seems to work just as well).

Thanks for making me aware of the Accelerate framework.

With the above settings I just compiled cmdstan 2.33.0 and the Bernoulli example…no tests yet.

3 Likes