Tutorial on running STAN code on WSL

I am delighted to share my recent breakthrough in enabling GPU acceleration for STAN computations on WSL (Windows Subsystem for Linux). After persistent challenges with OpenCL implementation, I’ve successfully resolved this issue and would like to document the streamlined procedure for fellow developers and data scientists.

The solution involves three key phases:

  1. PoCL implementation
  2. R environment configuration
  3. Validation and execution

Prerequisites

This guide assumes you have:

  • Functional WSL installation with Linux distribution
  • Operational RStudio Server
  • Installed cmdstanr package

1. PoCL Implementation

Given NVIDIA’s lack of native OpenCL support for WSL, we employ the Portable Computing Language (PoCL) framework.

Critical Preliminary Steps:

  • Remove any existing OpenCL installations in WSL (clinfo -l should return empty)
  • Follow this comprehensive guide for PoCL configuration

Implementation Workflow:

  1. Host Machine Preparation
  • Install latest NVIDIA drivers on Windows host
  • Do not install GPU drivers within WSL
  • Reference: CUDA on WSL User Guide
  1. CUDA Toolkit Installation
  • Install CUDA 12.4 (recommended for PyTorch compatibility) via official repository
  • Verify installation
nvidia-smi
  1. PoCL Compilation
wget https://github.com/pocl/pocl/archive/refs/tags/v6.0.zip
unzip v6.0.zip && cd pocl-6.0 && mkdir build
cmake -B build \
  -DCMAKE_C_FLAGS=-L/usr/lib/wsl/lib \
  -DCMAKE_CXX_FLAGS=-L/usr/lib/wsl/lib \
  -DENABLE_HOST_CPU_DEVICES=OFF \
  -DENABLE_CUDA=ON
cmake --build build -j$(nproc)
echo 'export POCL_BUILDING=1' >> ~/.bashrc
echo 'export OCL_ICD_VENDORS=~/pocl-6.0/build/ocl-vendors/' >> ~/.bashrc
source ~/.bashrc
cmake --install build
sudo apt install clinfo

Verification:

clinfo --list
# Expected output: NVIDIA GeForce RTX 4090

2. R Environment Configuration

Resolve RStudio Server’s environment isolation:

  1. Edit system-wide R configuration:
sudo vim /usr/lib/R/etc/Renviron.site

Append:

POCL_BUILDING=1
OCL_ICD_VENDORS=~/pocl-6.0/build/ocl-vendors/

  1. Restart R session (Session -> Restart R)

3. Validation & Execution

OpenCL Verification:

OpenCL::oclPlatforms()

STAN Implementation:

data {
  int<lower=1> k;
  int<lower=0> n;
  matrix[n, k] X;
  array[n] int y;
}
parameters {
  vector[k] beta;
  real alpha;
}
model {
  target += std_normal_lpdf(beta);
  target += std_normal_lpdf(alpha);
  target += bernoulli_logit_glm_lpmf(y | X, alpha, beta);
}
library(cmdstanr)

# Synthetic dataset
n <- 250000
k <- 20
X <- matrix(rnorm(n * k), ncol = k)
y <- rbinom(n, size = 1, prob = plogis(3 * X[,1] - 2 * X[,2] + 1))
mdata <- list(k = k, n = n, y = y, X = X)

# GPU-accelerated compilation
mod_cl <- cmdstan_model("opencl-files/bernoulli_logit_glm.stan",
                        cpp_options = list(stan_opencl = TRUE))

Performance Monitoring:

Utilize nvitop for real-time GPU utilization monitoring. Successful implementation will demonstrate significant GPU workload during computation.

2 Likes