Rebuilding CmdStan with OpenCL on MacOS

I have a model in which the computational bottleneck (AFAICT) is a large matrix multiplication operation, and I’m trying to see if I can get additional speedups from utilising the GPU.

I have CmdStanR 0.1.3 installed, as well as CmdStan 2.25, on MacOS Mojave 10.14.6.
Running clinfo -l gives:

Platform #0: Apple
 +-- Device #0: Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz
 +-- Device #1: Intel(R) HD Graphics 630
 `-- Device #2: AMD Radeon Pro 560 Compute Engine

So, following some previous discussions on the forums, I edited .cmdstanr/cmdstan-2.25.0/make/local (which was empty) to

STAN_OPENCL=true
OPENCL_DEVICE_ID=2
OPENCL_PLATFORM_ID=0

Than I ran rebuild_cmdstan() and got:

clang++ -std=c++1y -Wno-unknown-warning-option -Wno-tautological-compare -Wno-sign-compare -D_REENTRANT -Wno-ignored-attributes   -I stan/lib/stan_math/lib/opencl_2.2.0   -I stan/lib/stan_math/lib/tbb_2019_U8/include   -O3 -I src -I stan/src -I lib/rapidjson_1.1.0/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.7 -I stan/lib/stan_math/lib/boost_1.72.0 -I stan/lib/stan_math/lib/sundials_5.2.0/include    -DBOOST_DISABLE_ASSERTS  -DSTAN_OPENCL -DOPENCL_DEVICE_ID=2 -DOPENCL_PLATFORM_ID=0 -DCL_HPP_TARGET_OPENCL_VERSION=120 -DCL_HPP_MINIMUM_OPENCL_VERSION=120 -DCL_HPP_ENABLE_EXCEPTIONS -Wno-ignored-attributes            -Wl,-L,"/Users/adamhaber/.cmdstanr/cmdstan-2.25.0/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/Users/adamhaber/.cmdstanr/cmdstan-2.25.0/stan/lib/stan_math/lib/tbb"      bin/cmdstan/diagnose.o stan/lib/stan_math/lib/boost_1.72.0/stage/lib/libboost_program_options.a stan/lib/stan_math/lib/boost_1.72.0/stage/lib/libboost_program_options.a      -framework OpenCL   -o bin/diagnose
Undefined symbols for architecture x86_64:
  "tbb::internal::task_scheduler_observer_v3::observe(bool)", referenced from:
      stan::math::ad_tape_observer::ad_tape_observer() in diagnose.o
      tbb::interface6::task_scheduler_observer::~task_scheduler_observer() in diagnose.o
      tbb::interface6::task_scheduler_observer::~task_scheduler_observer() in diagnose.o
      tbb::interface6::task_scheduler_observer::~task_scheduler_observer() in diagnose.o
      tbb::internal::task_scheduler_observer_v3::~task_scheduler_observer_v3() in diagnose.o
      tbb::internal::task_scheduler_observer_v3::~task_scheduler_observer_v3() in diagnose.o
      tbb::internal::task_scheduler_observer_v3::~task_scheduler_observer_v3() in diagnose.o
      ...
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [bin/diagnose] Error 1
make: *** Waiting for unfinished jobs....

Any help would be much appreciated!

Hi,

you do not need to put this in make/local using cmdstanr.
So the steps would be:

  • clean the make local
  • rebuild_cmdstan() (with the clean make/local)
  • build your model with
opencl_options = list(
  stan_opencl = TRUE,
  opencl_platform_id = 2,
  opencl_device_id = 0
)

mod <- cmdstan_model(model_path, cpp_options = opencl_options)

The first compilation might take awhile but then it should be fine.

Thanks Rok!

I was able to compile the model, but it seems that regardless of what I write in opencl_options, I always get (from CmdStan this time):

opencl_platform = Apple
opencl_device = Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz

Instead of AMD Radeon Pro 560 Compute Engine. Any idea what might cause this?

Can you check if the GPU is being used in some sort of system monitor?

Doesn’t seem like it, at least when I use the Activity Monitor’s GPU History.

Oh, are you recompiling the model? If you only change the IDs the model will not recompile automatically. You need to set force_recompile = TRUE.

I was recompiling because I had to change something in the model - Stan complained about a cholesky_decompose in transformed data, which I assumed is a good sign.