Brms/cmdstan with OpenCL: chains finish unexpectedly

I’m trying to speed up a model on an HPC by using the OpenCL support through CmdStan. To get everything up and running, I’ve had to create my own docker image (https://hub.docker.com/r/jofrhwld/stan-opencl). All of the diagnostics I include here are the result of sending a job to the cluster that runs the image.

My make/local file looks like

CXX = clang++
STAN_OPENCL=true
OPENCL_PLATFORM_ID=0
OPENCL_DEVICE_ID=0

I tried running the sample brms model from here like so

library(cmdstanr)
#This is cmdstanr version 0.8.1
#- CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
#- CmdStan path: /usr/share/.cmdstan
#- CmdStan version: 2.35.0
library(brms)

options(
    cmdstanr_verbose = TRUE
)

fit <- brm(count ~ zAge + zBase * Trt + (1|patient),
           data = epilepsy, family = poisson(),
           chains = 2, opencl = opencl(c(0, 0)),
           backend = "cmdstanr")

The model compiles successfully, and begins sampling. The bottom of the results look like:

Chain 2 opencl 
Chain 2   device = 0 
Chain 2   platform = 0 
Chain 2 opencl_platform_name = NVIDIA CUDA 
Chain 2 opencl_device_name = Tesla P100-PCIE-12GB 
Warning: Chain 2 finished unexpectedly!

Warning: Use read_cmdstan_csv() to read the results of the failed chains.
Error: Fitting failed. Unable to retrieve the metadata.
In addition: Warning messages:
1: All chains finished unexpectedly! Use the $output(chain_id) method for more information.
 
2: No chains finished successfully. Unable to retrieve the fit. 
Execution halted

Based on a previous question, I also tried

fit <- cmdstanr_example(chains = 1)

Which also compiled successfully, then results in

Chain 1 Unrecoverable error evaluating the log probability at the initial value.
Chain 1 Exception: compile_kernel: calculate : Unknown error -11 (in '/tmp/Rtmp2fQHM3/model-2c6a73b284efe.stan', line 13, column 2 to column 37)
Warning: Chain 1 finished unexpectedly!

Warning message:
No chains finished successfully. Unable to retrieve the fit. 

Any suggestions for resolving this would be much appreciated!

EDIT:

When I run clinfo -l I get back

Platform #0: NVIDIA CUDA
 `-- Device #0: Tesla P100-PCIE-12GB
1 Like

For what it’s worth, I’ve just tried this locally on my M1 Mac, and have gotten similar results.

R -q -e "renv::init()"
clinfo -l
# Platform #0: Apple
#  `-- Device #0: Apple M1 Max

Then in R

install.packages("cmdstanr", repos = c('https://stan-dev.r-universe.dev', getOption("repos")))
renv::install("brms")

library(cmdstanr)
library(brms)
install_cmdstan(overwrite=TRUE)

fit <- brm(count ~ zAge + zBase * Trt + (1|patient),
           data = epilepsy, family = poisson(),
           chains = 1, opencl = opencl(c(0, 0)),
           backend = "cmdstanr")

# Compiling Stan program...
# Start sampling
# Running MCMC with 1 chain...
# 
# Warning: Chain 1 finished unexpectedly!
# 
# Error: Fitting failed. Unable to retrieve the metadata.
# In addition: Warning message:
# No chains finished successfully. Unable to retrieve the fit.
1 Like

I’ve had a similar issue, and found that the error occured more frequently, the longer the chains ran for. You may be able to decrease the number of iterations and run more chains to pool the results.