I’m trying to set myself up to give Stan a spin on the GPU. I’ve tried to follow instructions, but I think I’m stuck. I’ve checked the output of `clinfo -l`

:

```
Platform #0: Apple
+-- Device #0: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
`-- Device #1: AMD Radeon Pro 580 Compute Engine
```

From this I understand that I need to put the following in a text file named `local`

sitting in a directory called `make`

that sits in the top level of the math library.

```
STAN_OPENCL=true
OPENCL_DEVICE_ID=1
OPENCL_PLATFORM_ID=0
```

I’ve installed CmdStan 2.26.0 using `cmdstanr::install_cmdstan()`

. My best guess for what is the “top level of the math library” is `/Users/jacobsocolar/.cmdstanr/cmdstan-2.26.0/stan/lib/stan_math`

. Does this look right? So I now have a text file at `/Users/jacobsocolar/.cmdstanr/cmdstan-2.26.0/stan/lib/stan_math/make/local`

that contains:

```
STAN_OPENCL=true
OPENCL_DEVICE_ID=1
OPENCL_PLATFORM_ID=0
```

Does this look right as well?

So then I try to figure out whether this GPU thing is working. In terminal I can run, for example,

```
cd /Users/jacobsocolar/.cmdstanr/cmdstan-2.26.0/stan/lib/stan_math/
python runTests.py test/unit -f opencl
```

but I’m not too clear on what I should be looking for in the big text dump that comes out.

On the other hand, I’ve taken the logistic regression example from the GPU support for Stan paper:

```
data {
int < lower =1 > k ;
int < lower =0 > n ;
matrix [n , k ] X ;
int y [ n ];
}
parameters {
vector [ k ] beta ;
real alpha ;
}
model {
target += bernoulli_logit_glm_lpmf ( y | X , alpha , beta );
}
```

And in R I run:

```
library(cmdstanr)
n <- 1e+6
k <- 10
X <- matrix(rnorm(n*k), nrow=n)
mu <- 3*X[,1] - 2*X[,2] + 1
y <- rbinom(n, 1, 1/(1+exp(-mu)))
stan_data <- list(k=k, n=n, X=X, y=y)
gpu_test_mod <- cmdstan_model("/Users/jacobsocolar/Desktop/gpu_logistic_test.stan", force_recompile = T)
test_sampling <- gpu_test_mod$sample(data=stan_data, chains=3, parallel_chains = 3)
```

And I get execution times for the `$sample`

line on the order of 1300 seconds, which is a lot longer than I expected if this is running on the GPU?

Am I missing something?