Stan is not working on GPU in Linux

Hi,

I am trying to compile stan using the GPU. I am not sure if the problem is mine and want to ask first before opening an issue. I am testing this on two different settings getting different errors.

UBUNTU 20

On one side I am on a Linux machine with ubuntu 20. I have installed the drivers (last version 460) and nvidia-cuda-toolkit through apt-get, so the system is using a cuda10.1 compilation. CmdStan is compiled through cmdstanpy. Once installed I go directly into the directory where cmdstan is installed (i.e no python call) and execute : ./runTests.py test/unit -f opencl following all the instructiones outlined here Stan Math Library: OpenCL CPU/GPU Support. The computer has been compiling since 2 hours ago more or less and once finished it seems al the test have been performed sucessfully.

However, if I launch a simple stan program through the cmdstanpy interface. More precisely I execute:

## Compile the program
cpp_options = {
                #'STAN_THREADS'       : True , 
                #'STAN_CPP_OPTIMS'    : True, 
                'STAN_OPENCL'        : True ,
                'OPENCL_PLATFORM_ID' : 0    ,
                'OPENCL_DEVICE_ID'   : 0
    
              }

sm = CmdStanModel(stan_file='./stan_files/BNN.stan', cpp_options = cpp_options )

And get the following error:

ERROR:cmdstanpy:file /home/jmaronasm/stan/stan_files/BNN.stan, exception ERROR
In file included from stan/lib/stan_math/stan/math/opencl/prim.hpp:86,
                 from stan/lib/stan_math/stan/math/prim.hpp:7,
                 from stan/src/stan/io/dump.hpp:7,
                 from src/cmdstan/command.hpp:24,
                 from src/cmdstan/main.cpp:1:
stan/lib/stan_math/stan/math/opencl/scalar_type.hpp:14:8: error: partial specialization of ‘struct stan::scalar_type<T, typename std::enable_if<stan::math::conjunction<stan::is_kernel_expression_and_not_scalar<T, void> >::value, void>::type>’ after instantiation of ‘struct stan::scalar_type<stan::math::constant_<int>, void>’ [-fpermissive]
   14 | struct scalar_type<T, require_all_kernel_expressions_and_none_scalar_t<T>> {
      |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
make: *** [make/program:14: src/cmdstan/main_opencl.o] Error 1 
ERROR:cmdstanpy:model compilation failed

Ubuntu 16

In another machine I have ubuntu 16 installed. In this particular case, the nvidia-cuda-toolkit provided through apt-get is quite old (cuda 7.5). For that reason I install cuda 11.0 directly with the last nvidia drivers (460), and add the cuda path to the PATH and LD_LIBRARY_PATH. In this case the error I get when compiling is different and much bigger than the one on UBUNTU 20:

ERROR:cmdstanpy:file /home/jmaronasm/Escritorio/phd/TRABAJANDO/CURRENT_PROJECTS/VIDRIOS_MODELO_JERÁRQUICO/FULLBayesian_HGM_Stan/stan_files/BNN.stan, exception ERROR
In file included from stan/lib/stan_math/stan/math/opencl/kernel_generator.hpp:133:0,
                 from stan/lib/stan_math/stan/math/opencl/rev/vari.hpp:7,
                 from stan/lib/stan_math/stan/math/rev/core/var.hpp:5,
                 from stan/lib/stan_math/stan/math/rev/core/profiling.hpp:6,
                 from src/cmdstan/write_profiling.hpp:4,
                 from src/cmdstan/command.hpp:17,
                 from src/cmdstan/main.cpp:1:
stan/lib/stan_math/stan/math/opencl/kernel_generator/multi_result_kernel.hpp: In instantiation of ‘stan::math::results_cl<T_results>::operator=(const stan::math::expressions_cl<T_expressions ...>&)::<lambda(auto:18 ...)> [with auto:18 = {std::integral_constant<long unsigned int, 0ul>}; T_expressions = {const stan::math::addition_operator_<stan::math::load_<stan::math::matrix_cl<double, void>&>, stan::math::elt_multiply_<stan::math::load_<const stan::math::matrix_cl<double, void>&>, st an::math::trigamm  CONTINUES HERE

In both cases it seems the errors do not have to do with missing libraries in the linker or things like that. Any thoughts before opening an issue?

Thank you!

1 Like

@mitzimorris

I am taking a look. There is one additional similar report on Discourse right now.

2 Likes

thank you, please keep me on the loop if possible. Can you point me to the thread where this is being discoursed?

No other discussion going on right now, but lets continue in Partial specialization error when compiling model with opencl enabled

Seems that something is wrong with the 2.26.1 release wrt to this. I can replicate locally. Not sure what happened. It does work with 2.26.0. Will dig deeper and report back.

1 Like

It seems that g++ 9.3.0 does not like something about our OpenCL backend in 2.26.1. Its fixed on develop but that obviously wont help you there.

Run:

cmdstan_make_local(cpp_options = "CXXFLAGS += -fpermissive")
rebuild_cmdstan(cores = 4)

and then try again.

The other solution is to switch to using clang++ for now:

cmdstan_make_local(cpp_options = "CXX = clang++")
rebuild_cmdstan(cores = 4)

Sorry for the inconvenience.

3 Likes

As I am working in cmdstanpy and it seems there is no function as in R to make these changes, my solution has been to run:

install_cmdstan --overwrite --version 2.26.0 

and then link to the installation through set_cmdstan_path Installation — CmdStanPy 0.9.64 documentation and now it works.

Posting it here for python users.

Oh, sorry, completely forgot your case is cmdstanpy. In that case i would advise going to the cmdstan installation folder and writing

CXXFLAGS += -fpermissive

to the make/local file. It most likely doesnt exist, so just create one.
I would advise against using 2.26.0.

1 Like

That works, thank you.

1 Like