Cmdstan gpu support

linas · October 9, 2019, 3:09pm

I nstalled cmdstan-2.20.0 from tarball and modified make/local as in https://github.com/stan-dev/math/wiki/OpenCL-GPU-Routines. However make build gives me this message. Please advise.

g++ -std=c++1y -pthread -Wno-sign-compare -I stan/lib/stan_math/lib/opencl_1.2.8 -O3 -I src -I stan/src -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.3 -I stan/lib/stan_math/lib/boost_1.69.0 -I stan/lib/stan_math/lib/sundials_4.1.0/include -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -DSTAN_OPENCL -DOPENCL_DEVICE_ID= -DOPENCL_PLATFORM_ID= -DCL_USE_DEPRECATED_OPENCL_1_2_APIS -D__CL_ENABLE_EXCEPTIONS -Wno-ignored-attributes -c -o bin/cmdstan/stansummary.o src/cmdstan/stansummary.cpp
In file included from stan/lib/stan_math/stan/math/prim/mat/fun/mdivide_left_tri.hpp:10:0,
from stan/lib/stan_math/stan/math/prim/mat/fun/mdivide_left_tri_low.hpp:6,
from stan/lib/stan_math/stan/math/prim/mat/fun/chol2inv.hpp:7,
from stan/lib/stan_math/stan/math/prim/mat.hpp:68,
from stan/src/stan/mcmc/chains.hpp:5,
from src/cmdstan/stansummary.cpp:5:
stan/lib/stan_math/stan/math/opencl/opencl_context.hpp: In constructor ‘stan::math::opencl_context_base::opencl_context_base()’:
stan/lib/stan_math/stan/math/opencl/opencl_context.hpp:104:30: error: expected primary-expression before ‘>=’ token
if (OPENCL_PLATFORM_ID >= platforms_.size()) {
^~
stan/lib/stan_math/stan/math/opencl/opencl_context.hpp:108:48: error: expected primary-expression before ‘]’ token
platform_ = platforms_[OPENCL_PLATFORM_ID];
^
stan/lib/stan_math/stan/math/opencl/opencl_context.hpp:115:28: error: expected primary-expression before ‘>=’ token
if (OPENCL_DEVICE_ID >= devices_.size()) {
^~
stan/lib/stan_math/stan/math/opencl/opencl_context.hpp:119:42: error: expected primary-expression before ‘]’ token
device_ = devices_[OPENCL_DEVICE_ID];
^
stan/lib/stan_math/stan/math/opencl/opencl_context.hpp: In member function ‘std::__cxx11::string stan::math::opencl_context::description() const’:
stan/lib/stan_math/stan/math/opencl/opencl_context.hpp:227:48: error: expected primary-expression before ‘<<’ token
msg << "Platform ID: " << OPENCL_DEVICE_ID << “\n”;

ahartikainen · October 9, 2019, 3:12pm

What do you have for opencl in make/local?

linas · October 9, 2019, 3:15pm

STAN_OPENCL=true
OPENCL_DEVICE_ID={0} OPENCL_PLATFORM_ID={0}

clinfo -l
Platform #0: NVIDIA CUDA
`-- Device #0: GeForce GTX 1050 Ti

rok_cesnovar · October 9, 2019, 3:19pm

Hi,

the curly brackets are not needed there.

An example of a valid one make/local would be:

STAN_OPENCL = true
OPENCL_DEVICE_ID = 0
OPENCL_PLATFORM_ID = 0

linas · October 9, 2019, 3:35pm

It works. Thank you. Make gives a warning that clock skew was detected but I guess this is because of make/local.

linas · October 9, 2019, 3:41pm

When I tried to compile my model it gave bunch of errors such as (although openCL is included):
g++ -std=c++1y -pthread -Wno-sign-compare -I stan/lib/stan_math/lib/opencl_1.2.8 -O3 -I src -I stan/src -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.3.3 -I stan/lib/stan_math/lib/boost_1.69.0 -I stan/lib/stan_math/lib/sundials_4.1.0/include -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -DSTAN_OPENCL -DOPENCL_DEVICE_ID=0 -DOPENCL_PLATFORM_ID=0 -DCL_USE_DEPRECATED_OPENCL_1_2_APIS -D__CL_ENABLE_EXCEPTIONS -Wno-ignored-attributes -lOpenCL src/cmdstan/main.o stan/lib/stan_math/lib/sundials_4.1.0/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_4.1.0/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_4.1.0/lib/libsundials_idas.a /home/eval/lmockus/cmdstan-2.20.0.gpu/cliff/cliff.o -o /home/eval/lmockus/cmdstan-2.20.0.gpu/cliff/cliff
src/cmdstan/main.o: In function cl::detail::getPlatformVersion(_cl_platform_id*)': main.cpp:(.text+0x38): undefined reference to clGetPlatformInfo’
main.cpp:(.text+0x63): undefined reference to `clGetPlatformInfo’

rok_cesnovar · October 9, 2019, 3:58pm

That means that the linker cant find the OpenCL library to link. Are you using Windows or Linux? On Linux that should work if you installed the driver normally. On windows you need to set a flag.

Windows flag: LDFLAGS_OPENCL= -L"$(CUDA_PATH)\lib\x64" -lOpenCL

linas · October 9, 2019, 4:00pm

Linux
I just checked libOpenCL.so is in /usr/lib/x86_64-linux-gnu
I installed it as in https://askubuntu.com/questions/796770/how-to-install-libopencl-so-on-ubuntu

rok_cesnovar · October 9, 2019, 4:12pm

Yeah, that all seems fine.

I keep forgetting that we had a bug in 2.20, that we fixed a day or two after the release, but there was no hotfix release. The next release is coming in 9 days.

For the time being I would recommend cloning the latest develop (git clone --single-branch https://github.com/stan-dev/cmdstan.git --recursive). That one does require git unfortunately.

Can you share anything about the model you are trying to speed up? Thanks.

linas · October 9, 2019, 4:34pm

git is fine. Do you recommend git?

The model is in cliff.stan (3.3 KB)
It is a time series model with neural network instead of ar(1). It runs very slowly but uses matrix mult so I thought GPU might speed it up. I am also thinking about adding a threading in order to use all available cores.

rok_cesnovar · October 9, 2019, 4:41pm

Yes, I would recommend cloning with git.

There are quite a few matrix multiplications in here, so you should see some speedup here, depending on the sizes. At the moment I think you would benefit from threading more, given my quick inspection of the model. Except if the matrix multiplications are 200x200 times 200x200 or larger.

If you are interested in threading I recommend this tutorial: https://github.com/rmcelreath/cmdstan_map_rect_tutorial

linas · October 9, 2019, 5:24pm

Actually it is 200x10 matrices. Refactoring into map_rect form is bit more complicated - the model is big enough already. Design matrix for each year (X.) is different for each year and the calculations have to be done year by year. Perhaps each shard should contain data for each year? I am just thinking loudly. It means that each shard should have unequal number of data points. Hopefully durable… The problem I am encountering is “trace ran beyond…” which kills sampler - I posted about it few days ago - so when it is resolved I will proceed with multithreading.

Topic		Replies	Views
Compiling CmdStan 2.24.1 with STAN_OPENCL=true Interfaces cmdstan , gpu	3	693	October 14, 2020
Compiling CmdStan 2.28.0 with OpenCL fails on Windows 10 General cmdstan , installation , gpu , cmdstanr	8	1229	October 18, 2021
Rebuilding CmdStan with OpenCL on MacOS Interfaces cmdstan , cmdstanr	6	692	November 1, 2020
How do I use GPUs with CmdStan? Developers	11	1158	September 11, 2020
Make Error in /usr/bin/ld CmdStan	1	1075	October 8, 2019

Cmdstan gpu support

Related topics