Failed to Execute Stan's `runTests.py` on WSL2

I previously reported that I failed to fit a model using CmdStanR on the GPU with OpenCL on WSL2, encountering the error Chain <CHAIN_NUMBER> OpenCL Initialization: [Device] CL_INVALID_DEVICE: Unknown error -1, in this thread. After reading Help setting up for GPU computation (OSX), I realised that I might have failed to execute runTests.py as described in Stan Math Library: OpenCL CPU/GPU Support and Quick Start · stan-dev/math Wiki. I would like to confirm if this is indeed the case.

Question

Following the articles above, I executed runTests.py as described below:

cd ~/.cmdstan/cmdstan-2.35.0/stan/lib/stan_math/
python3 runTests.py test/unit -f opencl

A large number of tests ran, but for all tests within the visible command line history, I saw Running 0 tests from 0 test suites. For example:

------------------------------------------------------------
test/unit/math/opencl/cholesky_decompose_test --gtest_output="xml:test/unit/math/opencl/cholesky_decompose_test.xml"
Running main() from lib/benchmark_1.5.1/googletest/googletest/src/gtest_main.cc
[==========] Running 0 tests from 0 test suites.
[==========] 0 tests from 0 test suites ran. (0 ms total)
[  PASSED  ] 0 tests

Does this indicate that the tests are failing? If the tests were successful, what should the output look like?

My environments

OS, programming languages, and hardwares

  • Operating System: Ubuntu 24.04 LTS on Windows Subsystem for Linux 2 (WSL2)

    • All programs below were installed and run within the WSL2 environment, NOT the native Windows environment.
  • CmdStan Version: CmdStan v2.35.0

  • R 4.4.1

    • cmdstanr: 0.8.1
  • Python 3.12.3

  • Compiler/Toolkit:

    • CUDA 12.1
      $ nvcc -V
      
      nvcc: NVIDIA (R) Cuda compiler driver
      Copyright (c) 2005-2023 NVIDIA Corporation
      Built on Mon_Apr__3_17:16:06_PDT_2023
      Cuda compilation tools, release 12.1, V12.1.105
      Build cuda_12.1.r12.1/compiler.32688072_0
      
    • GPU: NVIDIA RTX 3060 (12GB of dedicated memory)
      $ clinfo -l
      
      Platform #0: Portable Computing Language
       `-- Device #0: NVIDIA GeForce RTX 3060
      
    • CPU: Intel Core i9-10980XE (18 cores, 36 threads)

Installation of PoCL

Initially, clinfo -l did not display any platforms or devices, due to the issue described in a GitHub issue entitled No OpenCL platforms reported · Issue #6951 · microsoft/WSL. Following the solution provided in a comment of the issue thread, I successfully installed PoCL, and now clinfo and clinfo -l recognise my GPU.

I finally installed PoCL as follows (In fact, I repeated the installation several times with different settings, ensuring to run xargs rm < install_manifest.txt in the pocl-6.0/build directory and deleting the pocl-6.0/build directory before each reinstallation):

  1. Executed the following commands to install PoCL as per the official PoCL installation guide:
    export LLVM_VERSION=18
    apt install -y python3-dev libpython3-dev build-essential ocl-icd-libopencl1 \
        cmake git pkg-config libclang-${LLVM_VERSION}-dev clang-${LLVM_VERSION} \
        llvm-${LLVM_VERSION} make ninja-build ocl-icd-libopencl1 ocl-icd-dev \
        ocl-icd-opencl-dev libhwloc-dev zlib1g zlib1g-dev clinfo dialog apt-utils \
        libxml2-dev libclang-cpp${LLVM_VERSION}-dev libclang-cpp${LLVM_VERSION} \
        llvm-${LLVM_VERSION}-dev
    
  2. Downloaded PoCL:
    wget https://github.com/pocl/pocl/archive/refs/tags/v6.0.tar.gz
    
  3. Extracted the tarball:
    tar -xzvf v6.0.tar.gz
    
  4. Changed to the PoCL directory:
    cd pocl-6.0
    
  5. Created a build directory:
    mkdir build
    
  6. Built PoCL following the instructions from the GitHub issue comment:
    cmake -B build \
        -DCMAKE_C_FLAGS=-L/usr/lib/wsl/lib \
        -DCMAKE_CXX_FLAGS=-L/usr/lib/wsl/lib \
        -DENABLE_HOST_CPU_DEVICES=OFF \ # Having both CPU and GPU simultaneously can cause issues https://github.com/pocl/pocl/issues/853#issuecomment-696367623
        -DENABLE_CUDA=ON \
        -DWITH_LLVM_CONFIG=/usr/bin/llvm-config-${LLVM_VERSION} \ # https://forums.developer.nvidia.com/t/need-support-to-run-opencl-application-on-tx2-board/264420/4
        -DENABLE_EXAMPLES=ON # To install CUDA test: NVIDIA GPU support — Portable Computing Language (PoCL) 6.0 documentation https://portablecl.org/docs/html/cuda.html#run-tests
    
  7. Compiled PoCL:
    cmake --build build -j34
    
  8. Added environment variables to .bashrc:
    echo 'export POCL_BUILDING=1' >> ~/.bashrc
    echo 'export OCL_ICD_VENDORS=<FULL_PATH_OF_MY_HOME_DIR>/pocl-6.0/build/ocl-vendors/' >> ~/.bashrc
    
    sudo nano /etc/OpenCL/vendors/nvidia.icd
    # Default is `/libnvidia-opencl.so` but there is no such a file in that path.
    # Therefore, replace the default line with the following line:
    # export /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.550.90.07
    
    source ~/.bashrc
    
  9. Installed PoCL:
    cmake --install build
    
  10. Verified GPU recognition with clinfo --list:
    $ clinfo --list
    
    Platform #0: Portable Computing Language
        `-- Device #0: NVIDIA GeForce RTX 3060
    

In the aforementioned Help setting up for GPU computation (OSX) - Modeling - The Stan Forums, @rok_cesnovar responded, so I am tagging you here in case you have any insights (apologies if this is inconvenient). Comments from others are also highly welcome. Any ideas or suggestions would be greatly appreciated…!

That indicates that the STAN_OPENCL define was not set. Do you have STAN_OPENCL=true either in your environment variables or in make/local?. The entire test is wrapped in an #ifdef

The last successful run in CI yielded

test/unit/math/opencl/rev/cholesky_decompose_test --gtest_output="xml:test/unit/math/opencl/rev/cholesky_decompose_test.xml"
Running main() from lib/benchmark_1.5.1/googletest/googletest/src/gtest_main.cc
[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from OpenCLCholeskyDecompose
[ RUN      ] OpenCLCholeskyDecompose.prim_rev_values_small
4 warnings generated.
2 warnings generated.
[       OK ] OpenCLCholeskyDecompose.prim_rev_values_small (3085 ms)
[ RUN      ] OpenCLCholeskyDecompose.prim_rev_size_1
2 warnings generated.
[       OK ] OpenCLCholeskyDecompose.prim_rev_size_1 (103 ms)
[ RUN      ] OpenCLCholeskyDecompose.prim_rev_size_0
[       OK ] OpenCLCholeskyDecompose.prim_rev_size_0 (0 ms)
[ RUN      ] OpenCLCholeskyDecompose.prim_rev_values_large
[       OK ] OpenCLCholeskyDecompose.prim_rev_values_large (4771 ms)
[----------] 4 tests from OpenCLCholeskyDecompose (7959 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (7959 ms total)
[  PASSED  ] 4 tests.
------------------------------------------------------------

@WardBrian Thank you for your response, and now I understood what the command line looks like when STAN passes the test. In fact, I have already set STAN_OPENCL=true in make/local in ~/.cmdstan/cmdstan-2.35.0/make/ for this trial and for the previous trial, too. Since I have not tried to set STAN_OPENCL=true in my environment variable (i.e. I have neither execute export STAN_OPENCL=true nor echo 'export STAN_OPENCL=true' >> ~/.bashrc && source ~/.bashrc), I will try that.

My current make/local looks like:

CXXFLAGS += -Wno-deprecated-declarations
LDFLAGS+= -L"/usr/local/cuda-12.1/targets/x86_64-linux/lib" -lOpenCL
LDFLAGS_OPENCL= -L"/usr/local/cuda-12.1/targets/x86_64-linux/lib" -lOpenCL
STAN_OPENCL=true
OPENCL_DEVICE_ID=0
OPENCL_PLATFORM_ID=0
CC = g++

Do I also have to copy that file to ~/.cmdstan/cmdstan-2.35.0/stan/lib/stan_math/make/local? I do not have ~/.cmdstan/cmdstan-2.35.0/stan/lib/stan_math/make/local so far but I will create it if necessary.

I realised python3 runTests.py -j4 test/unit/math/opencl took almost 2 hours, according to the STAN Math CI you shared jenkins / Stan/Math / develop / #297… I actually stop the first trial python3 runTests.py test/unit -f opencl in midstream because it took longer time than I expected (over 30 mins or so). Later, I ran python3 runTests.py test/unit -f opencl again and this produced Running 0 tests from 0 test suites.

I’m running python3 runTests.py test/unit -f opencl after I rebuild cmdstan by setting STAN_OPENCL=true in my environment variable. It’s still running and I’m waiting for the results.

@WardBrian

After I set STAN_OPENCL=true in make/local in ~/.cmdstan/cmdstan-2.35.0/make/ AND set STAN_OPENCL=true as an environment variable, I ran python3 runTests.py test/unit -f opencl in cd ~/.cmdstan/cmdstan-2.35.0/stan/lib/stan_math/. It took 4 hours to complete.

The log says that the following 4 tests ended up with failures:

  1. [ FAILED ] KernelGenerator.erf_test
    [ RUN      ] KernelGenerator.erf_test
    unknown file: Failure
    C++ exception with description "compile_kernel: calculate : Unknown error -11" thrown in the test body.
    [  FAILED  ] KernelGenerator.erf_test (229 ms) 
    
  2. [ FAILED ] KernelGenerator.erfc_test
    [ RUN      ] KernelGenerator.erfc_test
    unknown file: Failure
    C++ exception with description "compile_kernel: calculate : Unknown error -11" thrown in the test body.
    [  FAILED  ] KernelGenerator.erfc_test (211 ms)
    
  3. [ FAILED ] KernelGenerator.Phi_test
    [ RUN      ] KernelGenerator.Phi_test
    unknown file: Failure
    C++ exception with description "compile_kernel: calculate : Unknown error -11" thrown in the test body.
    [  FAILED  ] KernelGenerator.Phi_test (208 ms)
    
  4. [ FAILED ] KernelGenerator.inv_Phi_test
    [ RUN      ] KernelGenerator.inv_Phi_test
    unknown file: Failure
    C++ exception with description "compile_kernel: calculate : Unknown error -11" thrown in the test body.
    [  FAILED  ] KernelGenerator.inv_Phi_test (236 ms)
    

Is there anything I can do (e.g. to install additional packages, to revise PATH settings, etc) to pass these failed tests?

I attached the test results part shown in my command line: test_opencl_stan_math.txt (31.5 KB)

A couple things:

  1. Yes, you need to put a separate make/local in the stan_math folder if you want it to apply to tests. It does not look up the tree for the cmdstan one.
  2. Most of that test time is probably compiling. You can run python3 runTests.py followed by a full path to a single _test.cpp file to run one if you need to, which should be much faster

Debugging the failed tests is more difficult. That error indicates something went wrong during the kernel compilation inside OpenCL – the last time I saw it, it was due to a CUDA driver issue interacting oddly with docker. But if the other tests are passing, something is working, at least.

Have you tried using this hardware for Stan in normal windows (not WSL)?

1 Like

@WardBrian Sorry for my late reply.

  1. I understood that creating an additional make/local in the stan_math folder is required to enable STAN_OPENCL=true during the test;
  2. Thank you for sharing me how to accelerate the speed of test.

Actually, I have not tested Stan in my native Windows in the current machine in question (i.e. the computer that has an RTX 3060). I am afraid I would be unenble to test that soon, but I will try it.