Help setting up for GPU computation (OSX)

rok_cesnovar · January 28, 2021, 10:36am

Thanks for tagging @torkar

The instructions you linked are intended for working with OpenCL stuff in C++ in Stan Math (there is much more OpenCL related stuff implemented in Stan Math and most of it has not made it to Stan yet).
I think that is probably the first think linked if you write Stan GPU in google.

Instructions for CmdStan and OpenCL are now available here: 14 Parallelization | CmdStan User’s Guide

They are very new and thus probably have not been indexed by searches yet.

Here is a cmdstanr example:

library(cmdstanr)

generator = function(seed = 0, n = 1000, k = 10) {
  set.seed(seed)
  X <- matrix(rnorm(n * k), ncol = k)
  
  y <- 3 * X[,1] - 2 * X[,2] + 1
  y <- ifelse(runif(n) < 1 / (1 + exp(-y)), 1, 0)
  
  list(k = ncol(X), n = nrow(X), y = y, X = X)
}

data <- generator(1, 100000, 20)

# we will write the data to da file ourselves
# so we dont do it twice for GPU an CPU version
data_file <- paste0(tempfile(), ".json")
write_stan_json(data, data_file)

opencl_options = list(
  stan_opencl = TRUE,
  opencl_platform_id = 0,
  opencl_device_id = 0 #in your case its 1 here
)

model_code <- "
data {
  int<lower=1> k;
  int<lower=0> n;
  matrix[n, k] X;
  int y[n];
} 
 
parameters {
  vector[k] beta;
  real alpha;
} 

model {
  target += bernoulli_logit_glm_lpmf(y | X, alpha, beta);
}
"

stan_file <- write_stan_file(model_code)

mod <- cmdstan_model(stan_file)
mod_cl <- cmdstan_model(stan_file, cpp_options = opencl_options)

fit <- mod$sample(data = data_file, iter_sampling = 500, iter_warmup = 500, chains = 4, parallel_chains = 4, refresh = 0)
fit_cl <- mod_cl$sample(data = data_file, iter_sampling=500, iter_warmup = 500, chains = 4, parallel_chains = 4, refresh = 0)

We get the following:

CPU

Running MCMC with 4 parallel chains...

Chain 1 finished in 104.1 seconds.
Chain 3 finished in 104.4 seconds.
Chain 2 finished in 104.9 seconds.
Chain 4 finished in 104.7 seconds.

All 4 chains finished successfully.
Mean chain execution time: 104.5 seconds.
Total execution time: 105.7 seconds.

Running MCMC with 4 parallel chains...

GPU

Chain 3 finished in 15.6 seconds.
Chain 1 finished in 15.7 seconds.
Chain 2 finished in 15.7 seconds.
Chain 4 finished in 16.0 seconds.

All 4 chains finished successfully.
Mean chain execution time: 15.7 seconds.
Total execution time: 17.0 seconds.

In this example devices are selected at compile time.

Cmdstan 2.26 also support runtime selection of devices, but that has not made it to cmdstanr yet (it hopefully does this week). Will also add a vignette with exactly this example.

Topic		Replies	Views
How do I use GPUs with CmdStan? Developers	11	1178	September 11, 2020
Issue using external GPU with cmdstanr and OpenCL Modeling fitting-issues	5	464	October 28, 2023
Running Stan on the GPU with OpenCL: missing documentation page General math , cmdstanr	2	393	August 2, 2022
GPU compilation error Modeling	1	343	February 14, 2023
Parallelization CmdStan	7	88	March 20, 2025

Help setting up for GPU computation (OSX)

Related topics