Cmdstan samples extremely slowly with GPU

andrjohns · March 19, 2022, 8:36am

When using brms with opencl acceleration, you will only see a benefit if brms generates Stan code which can use or benefit from the acceleration. In Stan, there is a categorical_logit_glm distribution which can be GPU-accelerated. However, brms generates code which uses the categorical_logit distribution (not gpu-accelerated):

library(brms)

tmp_data <- data.frame(outcome = sample(1:4, 10, replace = T),
                      pred = rnorm(10))

make_stancode(outcome ~ pred,
              data = tmp_data,
              family = categorical("logit"),
              backend = "cmdstanr",
              opencl = opencl(c(0,0)))

Produces:

...
    for (n in 1 : N) {
      target += categorical_logit_lpmf(Y[n] | mu[n]);
    }

This is because the categorical_logit_glm distribution is not available in the current version of rstan, and brms has to remain compatible with both. Note that this is also mentioned in the brms::opencl documentation:
Only some Stan functions can be run on a GPU at this point and so a lot of brms models won't benefit from OpenCL for now.

If you’re going to be working with very large datasets that require days of computation time, you should most likely look to use Stan code itself (through cmdstanr or similar) and tune/optimise as needed, as brms has to generate code for maximum flexibility and compatibility, rather than speed and efficiency.

Note that this discussion has strayed from the original topic, so I’d recommend opening a new topic if you have any more questions

Topic		Replies	Views
CmdStan: CPU faster than GPU? General cmdstan	6	2288	February 18, 2021
Help setting up for GPU computation (OSX) Modeling gpu	5	1819	January 28, 2021
Issue using external GPU with cmdstanr and OpenCL Modeling fitting-issues	5	468	October 28, 2023
CmdStan OpenCL GPU problems and wiki page Developers	59	1967	January 29, 2020
Running Stan on the GPU with OpenCL on WSL: Seeking Assistance CmdStan linux , techniques , gpu	18	832	February 23, 2025

Cmdstan samples extremely slowly with GPU

Related topics