GPUs on Mac OSX, Apple M1

,

Dear Stan community,

I am trying to enable GPUs for Stan, but I get the following error for all the chains (only chain 1 reported here):

Chain 1 Unrecoverable error evaluating the log probability at the initial value. 
Chain 1 Exception: calculate: clCreateKernel CL_INVALID_KERNEL: Unknown error -48 (in '/var/folders/xj/kxl6k76n6jx4j312zm8vb73r0000gn/T/Rtmp3J2wMr/model-12da272e71fb.stan', line 13, column 2 to column 57) 

To test the GPUs, I am using the program provided here: Help setting up for GPU computation (OSX)

library(cmdstanr)

generator = function(seed = 0, n = 1000, k = 10) {
  set.seed(seed)
  X <- matrix(rnorm(n * k), ncol = k)
  
  y <- 3 * X[,1] - 2 * X[,2] + 1
  y <- ifelse(runif(n) < 1 / (1 + exp(-y)), 1, 0)
  
  list(k = ncol(X), n = nrow(X), y = y, X = X)
}

data <- generator(1, 100000, 20)

# we will write the data to da file ourselves
# so we dont do it twice for GPU an CPU version
data_file <- paste0(tempfile(), ".json")
write_stan_json(data, data_file)

opencl_options = list(
  stan_opencl = TRUE,
  opencl_platform_id = 0,
  opencl_device_id = 0 #in your case its 1 here
)

model_code <- "
data {
  int<lower=1> k;
  int<lower=0> n;
  matrix[n, k] X;
  int y[n];
} 
 
parameters {
  vector[k] beta;
  real alpha;
} 

model {
  target += bernoulli_logit_glm_lpmf(y | X, alpha, beta);
}
"

stan_file <- write_stan_file(model_code)

mod <- cmdstan_model(stan_file)
mod_cl <- cmdstan_model(stan_file, cpp_options = opencl_options)

fit <- mod$sample(data = data_file, iter_sampling = 500, iter_warmup = 500, chains = 4, parallel_chains = 4, refresh = 0)
fit_cl <- mod_cl$sample(data = data_file, iter_sampling=500, iter_warmup = 500, chains = 4, parallel_chains = 4, refresh = 0)

My clinfo is:

clinfo -l                        
Platform #0: Apple
 `-- Device #0: Apple M1

and my sessionInfo() :

R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] cmdstanr_0.4.0

loaded via a namespace (and not attached):
 [1] knitr_1.33           magrittr_2.0.1       tidyselect_1.1.1     munsell_0.5.0        colorspace_2.0-1     R6_2.5.0             rlang_0.4.11        
 [8] fansi_0.5.0          dplyr_1.0.6          tools_4.1.0          grid_4.1.0           checkmate_2.0.0      data.table_1.14.1    gtable_0.3.0        
[15] xfun_0.23            sessioninfo_1.1.1    utf8_1.2.1           cli_2.5.0            withr_2.4.2          posterior_0.1.6      ellipsis_0.3.2      
[22] abind_1.4-5          tibble_3.1.2         lifecycle_1.0.0      crayon_1.4.1         processx_3.5.2       tensorA_0.36.2       purrr_0.3.4         
[29] farver_2.1.0         ggplot2_3.3.3        vctrs_0.3.8          ps_1.6.0             glue_1.4.2           compiler_4.1.0       pillar_1.6.1        
[36] generics_0.1.0       scales_1.1.1         backports_1.2.1      distributional_0.2.2 jsonlite_1.7.2       pkgconfig_2.0.3 

Lastly, my R/Makevars is:

LLVM_LOC=/usr/local/opt/llvm
CC=$(LLVM_LOC)/bin/clang # -fopenmp
CXX=/usr/bin/clang++ # -fopenmp
# -O3 should be faster than -O2 (default) level optimisation ..
CFLAGS=-g -O3 -Wall -pedantic -std=gnu99 -mtune=native -pipe
CXXFLAGS=-g -O3 -Wall -pedantic -std=c++11 -mtune=native -pipe
LDFLAGS=-L/usr/local/opt/gettext/lib -L$(LLVM_LOC)/lib -Wl,-rpath,$(LLVM_LOC)/lib
# CPPFLAGS=-I/usr/local/opt/gettext/include -I$(LLVM_LOC)/include -I/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include

STAN_OPENCL=true
OPENCL_DEVICE_ID=0
OPENCL_PLATFORM_ID=0

(Modified to install data.table properly).

Any idea of what could get wrong? I suspect there is a problem in linking with some C++ libraries…

Maybe @rok_cesnovar has time to look into this?

Thanks Martin. I forgot about this one.

Unfortunately, I do not have access to a M1 Mac and no one has really tested OpenCL with the M1 ARMs. We know it functions nicely with regular ARM CPUs on Linux, but this is a bit of a different beast.

I will try to take a look in the next few days if I find any information on this.

@rok_cesnovar Many thanks, I am afraid that mac decided to give up openCL and only use their own library (Metal). As usual with mac and open libraries…

Hi all, stale thread, I know, but I am having the same issue and wondering if you have found a resolution. Many thanks in advance.

Unfortunately OpenCL acceleration with Stan on Apple Silicon is not possible. Apple does not have a direct OpenCL implementation on Apple Silicon, instead using a translation layer for their Metal framework. This is known not to be compatible with all OpenCL extensions and issues like the above are common

Thanks for the update. Frustrating to hear… although I have found that running my analysis on MacBook Air w/ M1 is faster than on my 5+ y/o PC + OpenCL with NVIDIA GPU. I suppose that’s some extension of Moore’s Law.