GPU setup on AWS EC2 with Docker

Hi there,

I’m trying to set up an AWS G4 instance to get going using a GPU. At this point I’m just trying to get the Hello World bernoulli in the docs to run using CmdStanPy in our Docker container.

I first did:

apt-get update && apt-get install ocl-icd-opencl-dev nvidia-cuda-toolkit clinfo

then tried to fit the model:


import os
from cmdstanpy import cmdstan_path, CmdStanModel

bernoulli_stan = os.path.join(cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.stan')

bernoulli_model = CmdStanModel(
    stan_file=bernoulli_stan,
    cpp_options={"STAN_OPENCL": True},
    )

bernoulli_data = os.path.join(
    cmdstan_path(), 'examples', 'bernoulli', 'bernoulli.data.json'
    )

bern_fit = bernoulli_model.sample(
    data=bernoulli_data,
    output_dir='/opt'
    )

but hit the following in stdout:

opencl_context: clGetPlatformIDs CL_PLATFORM_NOT_FOUND_KHR: Unknown error -1001

After a little research on the GPU threads here I started following instructions here.

I’m still stumped, though. Not sure if I missed some crucial documentation somewhere or if there are other tricks I need to be aware of, but the result of calling clinfo is:

Number of platforms                               0

The result of cat /etc/OpenCL/vendors/* is:

libnvidia-opencl.so.1

so am not exactly sure how to proceed with step 3 in the doc I linked to. Would appreciate any assistance or links to docs, or a redirect if I’m entirely misguided here.

Thanks for your help!

I think the only thing missing is a reboot of the instance.

The driver is installed with the cuda-toolkit so that should be good.

1 Like

I apologize, it seems that installing the driver separately is required. So the minimal instructions are

sudo apt-get -y update
sudo apt-get install -y nvidia-driver-460 nvidia-cuda-toolkit clinfo

Someone seeing this at a later point should replaced 460 with a larger number if available.

2 Likes

@rok_cesnovar thanks for the suggestions! Still having trouble here, though. I think it is perhaps made a little more complicated by the use of a docker image. My hunch is I probably need to install the drivers and toolchain outside Docker, which took me to the AWS NVIDIA driver install guide. I’m not sure how that matches up with what CmdStan expects. In any case, when I figure out the full toolchain install I’ll post an update for posterity.

3 Likes

hi @nerutenbeck , I’m going to add a stan blog post to set up rstudio and stan on ec2. Do you have any interest in doing one to set up gpu?

3 Likes

@spinkney I’d love to! Thanks for the offer. DM me to briefly discuss timing and details? I’m also on the Stan slack channel if you’d rather discuss there.

Did the stan on ec2-gpu blog post ever go up? If so could you provide a link?

Hi, it’s nearly 2 years later but I had similar errors trying to set up GPUs on a HPC platform, using Apptainer instead of Docker.

The first problem I had was with the output of
$ clinfo
Number of platforms 0

This is because the container needs to be told to use GPUs e.g. for Apptainer with the --nv flag when executing/running, and for docker the --gpus all flag. Then the container can find the platform and device when running clinfo -l.

The second was the following error:
opencl_context: clGetPlatformIDs CL_PLATFORM_NOT_FOUND_KHR: Unknown error -1001

This was solved by installing some additional dependencies when building the container
apt-get -y install pocl-opencl-icd nvidia-settings
which I found from reading this reference.

I hope that helps anyone who is similarly stuck.
Chris

1 Like

Thank you for your help! The solution “apt-get -y install pocl-opencl-icd nvidia-settings” worked for me. I want to use brms in Google Colab (runtime type: R) to run some models with GPU acceleration. For example, the following code:

m1 <- brm(y~ x+(1|id), family=negbinomial, data = df, warmup = 1000, iter = 2000, cores=4,chains = 4, opencl = opencl(c(0, 0)), backend="cmdstanr")
However, it will returns many errors, mainly about “opencl”.

However, it returned many errors, mainly related to “opencl”. Then, I tried running this line in the Colab terminal: “apt-get -y install pocl-opencl-icd nvidia-settings”. Amazingly, it worked!

Hope this can help others who what to run brms with GPU on the google colab!

1 Like