Hi,
I am trying to compile stan using the GPU. I am not sure if the problem is mine and want to ask first before opening an issue. I am testing this on two different settings getting different errors.
UBUNTU 20
On one side I am on a Linux machine with ubuntu 20. I have installed the drivers (last version 460) and nvidia-cuda-toolkit
through apt-get
, so the system is using a cuda10.1 compilation. CmdStan is compiled through cmdstanpy
. Once installed I go directly into the directory where cmdstan is installed (i.e no python call) and execute : ./runTests.py test/unit -f opencl
following all the instructiones outlined here Stan Math Library: OpenCL CPU/GPU Support. The computer has been compiling since 2 hours ago more or less and once finished it seems al the test have been performed sucessfully.
However, if I launch a simple stan program through the cmdstanpy
interface. More precisely I execute:
## Compile the program
cpp_options = {
#'STAN_THREADS' : True ,
#'STAN_CPP_OPTIMS' : True,
'STAN_OPENCL' : True ,
'OPENCL_PLATFORM_ID' : 0 ,
'OPENCL_DEVICE_ID' : 0
}
sm = CmdStanModel(stan_file='./stan_files/BNN.stan', cpp_options = cpp_options )
And get the following error:
ERROR:cmdstanpy:file /home/jmaronasm/stan/stan_files/BNN.stan, exception ERROR
In file included from stan/lib/stan_math/stan/math/opencl/prim.hpp:86,
from stan/lib/stan_math/stan/math/prim.hpp:7,
from stan/src/stan/io/dump.hpp:7,
from src/cmdstan/command.hpp:24,
from src/cmdstan/main.cpp:1:
stan/lib/stan_math/stan/math/opencl/scalar_type.hpp:14:8: error: partial specialization of ‘struct stan::scalar_type<T, typename std::enable_if<stan::math::conjunction<stan::is_kernel_expression_and_not_scalar<T, void> >::value, void>::type>’ after instantiation of ‘struct stan::scalar_type<stan::math::constant_<int>, void>’ [-fpermissive]
14 | struct scalar_type<T, require_all_kernel_expressions_and_none_scalar_t<T>> {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make: *** [make/program:14: src/cmdstan/main_opencl.o] Error 1
ERROR:cmdstanpy:model compilation failed
Ubuntu 16
In another machine I have ubuntu 16 installed. In this particular case, the nvidia-cuda-toolkit
provided through apt-get is quite old (cuda 7.5). For that reason I install cuda 11.0
directly with the last nvidia drivers (460), and add the cuda path to the PATH
and LD_LIBRARY_PATH
. In this case the error I get when compiling is different and much bigger than the one on UBUNTU 20:
ERROR:cmdstanpy:file /home/jmaronasm/Escritorio/phd/TRABAJANDO/CURRENT_PROJECTS/VIDRIOS_MODELO_JERÁRQUICO/FULLBayesian_HGM_Stan/stan_files/BNN.stan, exception ERROR
In file included from stan/lib/stan_math/stan/math/opencl/kernel_generator.hpp:133:0,
from stan/lib/stan_math/stan/math/opencl/rev/vari.hpp:7,
from stan/lib/stan_math/stan/math/rev/core/var.hpp:5,
from stan/lib/stan_math/stan/math/rev/core/profiling.hpp:6,
from src/cmdstan/write_profiling.hpp:4,
from src/cmdstan/command.hpp:17,
from src/cmdstan/main.cpp:1:
stan/lib/stan_math/stan/math/opencl/kernel_generator/multi_result_kernel.hpp: In instantiation of ‘stan::math::results_cl<T_results>::operator=(const stan::math::expressions_cl<T_expressions ...>&)::<lambda(auto:18 ...)> [with auto:18 = {std::integral_constant<long unsigned int, 0ul>}; T_expressions = {const stan::math::addition_operator_<stan::math::load_<stan::math::matrix_cl<double, void>&>, stan::math::elt_multiply_<stan::math::load_<const stan::math::matrix_cl<double, void>&>, st an::math::trigamm CONTINUES HERE
In both cases it seems the errors do not have to do with missing libraries in the linker or things like that. Any thoughts before opening an issue?
Thank you!