GPU CI testing


#1

@Stevo15025 sent me this message but I have some time constraints in the next couple of weeks so I want to open this up. @mitzimorris said she might be able to help with this issue. Pasted the message below:

Is the Jenkins computer a mac? Usually Mac’s should come with OpenCL, but this one does not seem to have anything available?

http://d1m1s1b1.stat.columbia.edu:8080/job/Math%20Pull%20Request%20-%20Tests%20-%20Unit/921/console

Alt, is it possible to install the AMD OpenCL for the computer on Jenkins like we do on travis?


#2

Thanks for posting this Sean!

@mitzimorris would it be possible to add the AMD OpenCL SDK to the Jenkins computers?

I believe you can use the install script here and download script here on a mac to add the AMD OpenCL SDK.


#3

Apple ships OpenCL but not the C++ header only wrapper cl.hpp: you need to download the header and probably commit it to your repo, then it should work fine w/o AMD’s OpenCL SDK.


#4

hi Steve,
saw your request and added it to my “todo” list - will send update when I get to it - sorry for the lameness.
cheers,
Mitzi


#5

@mitzimorris thanks! I believe this is the last piece the GPU side needs for testing. Once we pass Jenkins we can start the review process

@maedoc apologies, I grabbed the wrong console output from the jenkins run. In the below output you can see the tests fail because of a linker error for OpenCL and not the OpenCL header IE

ld: library not found for -lOpenCL
clang: error: linker command failed with exit code 1 (use -v to see invocation)

http://d1m1s1b1.stat.columbia.edu:8080/job/Math%20Pull%20Request%20-%20Tests%20-%20Unit/930/console


#6

Likely using -framework OpenCL instead of -lOpenCL will do the trick


#7

uhm… at the risk of hijacking this… I am in a similar situation as I need to test code inside Stan with has “special” dependencies (MPI). So I am wondering how you solved the issues around

  • making sure that during the tests you have access to the GPU - in terms of hardware and in terms of additional software dependencies
  • what do you do with tests which rely on the presence of a GPU on machines which do not have them? Are these tests disabled and if so, how?

Thanks much!


#8

@maedoc I think that did the trick, much appreciated!

@wds15

  • making sure that during the tests you have access to the GPU - in terms of hardware and in terms of additional software dependencies
  • what do you do with tests which rely on the presence of a GPU on machines which do not have them? Are these tests disabled and if so, how?

We are able to install the AMD-SDK for Travis which compiles the OpenCL so it runs on the CPU (see here). Jenkins has OpenCL installed already (mac) and also compiles on the CPU for testing.

This blog seems to have a decent explanation of how to test MPI based code on Travis. For Jenkins we may have to install an MPI package


#9

Hello all,

I’m working on problems in which GPU integration into Stan would be extremely helpful. I talked to some of the developers at StanCon2018 about lending a hand with the effort. How get started to help contribute?

Thanks,
Grant Schissler


#10

At this point we have everything working and are breaking down the GPU PR into smaller PRs for review purposes (while dotting some T’s and crossing some i’s along the way).

We could always use another reviewer!

You can see the PR’s below

  1. OpenCL Headers
  2. gpu setup
  3. matrix_gpu

The rest are tbd. I think at this point I would rather get a green light on these three then build out the rest so that I don’t have to merge/rebase two more PRs

You can see the full PR here


#11

Thanks for the quick reply! Very exciting that the implementation is
getting more mature. I’ll review the PRs and update the forum soon.


#12

Is Jenkins still being run on a mac? In my tests when I check for OS_TYPE in the makefile it does the options for Linux.

If this is not run on a mac anymore can you install the amd sdk like we do for travis?


#13

@mitzimorris @seantalts

You can see here the gpu testing is failing on Jenkins with

unknown file
C++ exception with description "build: The OpenCL application ended with the error: 
CL_PLATFORM_NOT_FOUND_KHR" thrown in the test body.

The CL_PLATFORM_NOT_FOUND_KHR error means that OpenCL cannot find an appropriate platform to compile on.

  1. Can you check that the AMD SDK is installed on the machine and symlinked?
  2. Can you check and update the graphics card drivers on the Jenkins machine? From here that seems to be a possible issue

Another options is for us to re-enable testing the OpenCL code on the CPU, which is pretty simple.


#14

That test ran on the linux node. I thought maybe we were running the GPU tests on the Mac Pro? If this is the case, you should change the agent line to refer to gelman-group-mac.


#15

I’d like to be able to test it on either, though if that’s a hassle for now no worries. I can switch it over to just run on the gelman-group-mac


#16

I forget what the testing plan was, but I’m not sure if the Linux box has a GPU.


#17

Cool no worries we can just use the mac


#18

@bgoodri do you know if the Linux box has a GPU? Forgive me if we went over
this already, have a lot of loose threads I’m trying to keep track of.


#19
bgoodri@Stanubuntu16:~$ lspci -v | grep -F "VGA"
02:00.0 VGA compatible controller: NVIDIA Corporation GM107GL [Quadro K620] (rev a2) (prog-if 00 [VGA controller])

It does not, however, have a cooling system.


#20

Oh wait, I already installed the thing on the linux box. Or at least, I ran the travis script you linked to, but that just installed it for my current user and current session. What’s the appropriate way to install this for all users?