GPUs on Mac OSX, Apple M1

,

Dear Stan community,

I am trying to enable GPUs for Stan, but I get the following error for all the chains (only chain 1 reported here):

Chain 1 Unrecoverable error evaluating the log probability at the initial value. 
Chain 1 Exception: calculate: clCreateKernel CL_INVALID_KERNEL: Unknown error -48 (in '/var/folders/xj/kxl6k76n6jx4j312zm8vb73r0000gn/T/Rtmp3J2wMr/model-12da272e71fb.stan', line 13, column 2 to column 57) 

To test the GPUs, I am using the program provided here: Help setting up for GPU computation (OSX)

library(cmdstanr)

generator = function(seed = 0, n = 1000, k = 10) {
  set.seed(seed)
  X <- matrix(rnorm(n * k), ncol = k)
  
  y <- 3 * X[,1] - 2 * X[,2] + 1
  y <- ifelse(runif(n) < 1 / (1 + exp(-y)), 1, 0)
  
  list(k = ncol(X), n = nrow(X), y = y, X = X)
}

data <- generator(1, 100000, 20)

# we will write the data to da file ourselves
# so we dont do it twice for GPU an CPU version
data_file <- paste0(tempfile(), ".json")
write_stan_json(data, data_file)

opencl_options = list(
  stan_opencl = TRUE,
  opencl_platform_id = 0,
  opencl_device_id = 0 #in your case its 1 here
)

model_code <- "
data {
  int<lower=1> k;
  int<lower=0> n;
  matrix[n, k] X;
  int y[n];
} 
 
parameters {
  vector[k] beta;
  real alpha;
} 

model {
  target += bernoulli_logit_glm_lpmf(y | X, alpha, beta);
}
"

stan_file <- write_stan_file(model_code)

mod <- cmdstan_model(stan_file)
mod_cl <- cmdstan_model(stan_file, cpp_options = opencl_options)

fit <- mod$sample(data = data_file, iter_sampling = 500, iter_warmup = 500, chains = 4, parallel_chains = 4, refresh = 0)
fit_cl <- mod_cl$sample(data = data_file, iter_sampling=500, iter_warmup = 500, chains = 4, parallel_chains = 4, refresh = 0)

My clinfo is:

clinfo -l                        
Platform #0: Apple
 `-- Device #0: Apple M1

and my sessionInfo() :

R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] cmdstanr_0.4.0

loaded via a namespace (and not attached):
 [1] knitr_1.33           magrittr_2.0.1       tidyselect_1.1.1     munsell_0.5.0        colorspace_2.0-1     R6_2.5.0             rlang_0.4.11        
 [8] fansi_0.5.0          dplyr_1.0.6          tools_4.1.0          grid_4.1.0           checkmate_2.0.0      data.table_1.14.1    gtable_0.3.0        
[15] xfun_0.23            sessioninfo_1.1.1    utf8_1.2.1           cli_2.5.0            withr_2.4.2          posterior_0.1.6      ellipsis_0.3.2      
[22] abind_1.4-5          tibble_3.1.2         lifecycle_1.0.0      crayon_1.4.1         processx_3.5.2       tensorA_0.36.2       purrr_0.3.4         
[29] farver_2.1.0         ggplot2_3.3.3        vctrs_0.3.8          ps_1.6.0             glue_1.4.2           compiler_4.1.0       pillar_1.6.1        
[36] generics_0.1.0       scales_1.1.1         backports_1.2.1      distributional_0.2.2 jsonlite_1.7.2       pkgconfig_2.0.3 

Lastly, my R/Makevars is:

LLVM_LOC=/usr/local/opt/llvm
CC=$(LLVM_LOC)/bin/clang # -fopenmp
CXX=/usr/bin/clang++ # -fopenmp
# -O3 should be faster than -O2 (default) level optimisation ..
CFLAGS=-g -O3 -Wall -pedantic -std=gnu99 -mtune=native -pipe
CXXFLAGS=-g -O3 -Wall -pedantic -std=c++11 -mtune=native -pipe
LDFLAGS=-L/usr/local/opt/gettext/lib -L$(LLVM_LOC)/lib -Wl,-rpath,$(LLVM_LOC)/lib
# CPPFLAGS=-I/usr/local/opt/gettext/include -I$(LLVM_LOC)/include -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include

STAN_OPENCL=true
OPENCL_DEVICE_ID=0
OPENCL_PLATFORM_ID=0

(Modified to install data.table properly).

Any idea of what could get wrong? I suspect there is a problem in linking with some C++ libraries…

Maybe @rok_cesnovar has time to look into this?

Thanks Martin. I forgot about this one.

Unfortunately, I do not have access to a M1 Mac and no one has really tested OpenCL with the M1 ARMs. We know it functions nicely with regular ARM CPUs on Linux, but this is a bit of a different beast.

I will try to take a look in the next few days if I find any information on this.

1 Like

@rok_cesnovar Many thanks, I am afraid that mac decided to give up openCL and only use their own library (Metal). As usual with mac and open libraries…

1 Like

Hi all, stale thread, I know, but I am having the same issue and wondering if you have found a resolution. Many thanks in advance.

Unfortunately OpenCL acceleration with Stan on Apple Silicon is not possible. Apple does not have a direct OpenCL implementation on Apple Silicon, instead using a translation layer for their Metal framework. This is known not to be compatible with all OpenCL extensions and issues like the above are common

1 Like

Thanks for the update. Frustrating to hear… although I have found that running my analysis on MacBook Air w/ M1 is faster than on my 5+ y/o PC + OpenCL with NVIDIA GPU. I suppose that’s some extension of Moore’s Law.

1 Like