OpenCL and brms

Within-thread parallelization works extremely well in brms, including use of the threading function. It appears to be using the CPU and what I presume is TBB,.

I would like to be able to direct brms to compile an OpenCL cmdstan model instead. Assuming it is not built in somehow, can we force that to occur by sending cmdstanr commands along with the brm function for compiling (e.g., stan_opencl = TRUE), or otherwise modifying the compiled file?

I haven’t had any luck so far. Thanks.

1 Like

Yes, it is indeed TBB working under the hood. I don’t think we can link brms to OpenCL (yet), but if it is possible @wds15 surely knows how.

There is nothing preventing TBB & openCL at the same time. That’s a matter of configuring cmdstanr … @rok_cesnovar ?

(and I am happy to hear that within-chain parallelisation is working so well for you)

The only thing needed to run a model with OpenCL in cmdstanr is:

  • add STAN_OPENCL to cpp_options (the same way we now set STAN_THREADS for threading)
  • select the device with opencl_ids in the $sample() call (instead of thread_per_chain for threading)

There is a vignette for cmdstanr showcasing that: Running Stan on the GPU with OpenCL • cmdstanr

For now you would need to get the brms generated Stan code and data and run it in cmdstanr the way its shown in this vignette.

I guess step one would be to get a few models that use GLMs that we could evaluate. Do you maybe have an brms example that uses a Stan GLM function that we could use to evaluate @paul.buerkner?

A sidenote is that the Stan-to-C++ codegen is not fully optimized yet in stanc3 for all cases. Still, it should benefit models where lpdfs/lpmfs dominate the execution time. Those that call GLMs for now. There are a lot of other optimizations in the backend that arent yet exposed fully.

We had a plan to optimize in this release cycle, but I got too busy with other stuff and then a bit burned out a month or so ago… I am a bit sad we didnt get that in as this will be very nice once we finish it with 60x+ speedups for logistic regression models like the redcard example.

I was meaning to open a brms issue once I had something more concrete, so we could discuss a plan for this, if this is something brms would want to support, but not there yet.

It would be great if we could support OpenCL in brms. And I don’t think its too complicated given that brms already supports threading and, as such, has all the code infrastructure in place to pass things to cmdstanr in the right places. Feel free to open an issue and I can implement quickly I hope.

brms always uses Stan GLM functions when permitted by the model. All all the common examples should work out of the box. We can also just simulate some data for this purpose.


Also going to take this opportunity to say that I love that Baseball Prospectus is using Stan. Huge fan here :)

1 Like