Hi, I have couple of things to ask / mention.
- GPU flags on Windows (CmdStan)
- Needed flags for Stan-math (PyStan)
CmdStan
I was able to get GPU running on Windows with CmdStan (CmdStanPy interface) following the instructions in https://github.com/stan-dev/math/wiki/OpenCL-GPU-Routines
I only needed to change LDFLAGS_OPENCL= -L"$(CUDA_PATH)\lib\x64" -lOpenCL
from
LDFLAGS_OPENCL= -L"C:/PROGRA~1/NVIDIA~2/CUDA/v10.1/lib/x64" -lOpenCL
to
LDFLAGS_OPENCL= -L"C:/PROGRA~1/NVIDIA~2/CUDA/v10.1/lib/x64" C:/Windows/System32/OpenCL.dll
Could someone explain why this works and -lOpenCL
fails?
(Used conda installed mingw-w64 (gcc) and mingw32-make: conda install m2w64-toolchain -c msys2
)
(Short path is from cmdstanpy.utils.windows_short_path
)
PyStan
I then tried to do the same with PyStan (added stan-math OpenCl to path inside pystan/model.py, and added extra_link_args
)
My input is
stan_model = pystan.StanModel(
model_code=stan_code,
extra_compile_args = ["-DSTAN_OPENCL",
"-DOPENCL_DEVICE_ID=0",
"-DOPENCL_PLATFORM_ID=0",
],
extra_link_args = ['-L"C:/PROGRA~1/NVIDIA~2/CUDA/v10.1/lib/x64"',
"C:/Windows/System32/OpenCL.dll",
],
verbose=True)
The output is
Compiling C:\Users\user\AppData\Local\Temp\tmp0rlxxu53\stanfit4anon_model_9d0097b8e1c832bbbae3662f9bcf36e4_8566498776977557004.pyx because it changed.
[1/1] Cythonizing C:\Users\user\AppData\Local\Temp\tmp0rlxxu53\stanfit4anon_model_9d0097b8e1c832bbbae3662f9bcf36e4_8566498776977557004.pyx
building 'stanfit4anon_model_9d0097b8e1c832bbbae3662f9bcf36e4_8566498776977557004' extension
C:\Users\user\miniconda3\envs\stan\Library\mingw-w64\bin\gcc.exe -mdll -O -Wall -DMS_WIN64 -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -IC:\Users\user\AppData\Local\Temp\tmp0rlxxu53 -Ic:\users\user\github\pystan\pystan -Ic:\users\user\github\pystan\pystan\stan\src -Ic:\users\user\github\pystan\pystan\stan\lib\stan_math -Ic:\users\user\github\pystan\pystan\stan\lib\stan_math\lib\eigen_3.3.3 -Ic:\users\user\github\pystan\pystan\stan\lib\stan_math\lib\boost_1.69.0 -Ic:\users\user\github\pystan\pystan\stan\lib\stan_math\lib\sundials_4.1.0\include -Ic:\users\user\github\pystan\pystan\stan\lib\stan_math\lib\opencl_1.2.8 -IC:\Users\user\miniconda3\envs\stan\lib\site-packages\numpy\core\include -IC:\Users\user\miniconda3\envs\stan\include -IC:\Users\user\miniconda3\envs\stan\include -c C:\Users\user\AppData\Local\Temp\tmp0rlxxu53\stanfit4anon_model_9d0097b8e1c832bbbae3662f9bcf36e4_8566498776977557004.cpp -o c:\users\user\appdata\local\temp\tmp0rlxxu53\stanfit4anon_model_9d0097b8e1c832bbbae3662f9bcf36e4_8566498776977557004.o -O2 -ftemplate-depth-256 -Wno-unused-function -Wno-uninitialized -std=c++1y -D_hypot=hypot -pthread -fexceptions -DSTAN_OPENCL -DOPENCL_DEVICE_ID=0 -DOPENCL_PLATFORM_ID=0
writing c:\users\user\appdata\local\temp\tmp0rlxxu53\stanfit4anon_model_9d0097b8e1c832bbbae3662f9bcf36e4_8566498776977557004.cp37-win_amd64.def
C:\Users\user\miniconda3\envs\stan\Library\mingw-w64\bin\g++.exe -shared -s c:\users\user\appdata\local\temp\tmp0rlxxu53\stanfit4anon_model_9d0097b8e1c832bbbae3662f9bcf36e4_8566498776977557004.o c:\users\user\appdata\local\temp\tmp0rlxxu53\stanfit4anon_model_9d0097b8e1c832bbbae3662f9bcf36e4_8566498776977557004.cp37-win_amd64.def -LC:\Users\user\miniconda3\envs\stan\libs -LC:\Users\user\miniconda3\envs\stan\PCbuild\amd64 -lpython37 -lmsvcr140 -o C:\Users\user\AppData\Local\Temp\tmp0rlxxu53\stanfit4anon_model_9d0097b8e1c832bbbae3662f9bcf36e4_8566498776977557004.cp37-win_amd64.pyd -L"C:/PROGRA~1/NVIDIA~2/CUDA/v10.1/lib/x64" C:/Windows/System32/OpenCL.dll
This gets stuck in the linking step, not sure what is the problem.
cc @ariddell
What are the needed flags for Stan-math GPU
Model
Model is a GP with cholesky_decompose
data {
int<lower=1> N;
real x[N];
vector[N] y;
}
transformed data {
vector[N] mu = rep_vector(0, N);
}
parameters {
real<lower=0> rho;
real<lower=0> alpha;
real<lower=0> sigma;
}
model {
matrix[N, N] L_K;
matrix[N, N] K = cov_exp_quad(x, alpha, rho);
real sq_sigma = square(sigma);
// diagonal elements
for (n in 1:N)
K[n, n] = K[n, n] + sq_sigma;
L_K = cholesky_decompose(K);
rho ~ inv_gamma(5, 5);
alpha ~ std_normal();
sigma ~ std_normal();
y ~ multi_normal_cholesky(mu, L_K);
}