OpenCL and brms

bachlaw · May 17, 2021, 1:15pm

Within-thread parallelization works extremely well in brms, including use of the threading function. It appears to be using the CPU and what I presume is TBB,.

I would like to be able to direct brms to compile an OpenCL cmdstan model instead. Assuming it is not built in somehow, can we force that to occur by sending cmdstanr commands along with the brm function for compiling (e.g., stan_opencl = TRUE), or otherwise modifying the compiled file?

I haven’t had any luck so far. Thanks.

paul.buerkner · May 18, 2021, 7:10am

Yes, it is indeed TBB working under the hood. I don’t think we can link brms to OpenCL (yet), but if it is possible @wds15 surely knows how.

wds15 · May 18, 2021, 7:13am

There is nothing preventing TBB & openCL at the same time. That’s a matter of configuring cmdstanr … @rok_cesnovar ?

(and I am happy to hear that within-chain parallelisation is working so well for you)

rok_cesnovar · May 18, 2021, 7:37am

The only thing needed to run a model with OpenCL in cmdstanr is:

add STAN_OPENCL to cpp_options (the same way we now set STAN_THREADS for threading)
select the device with opencl_ids in the $sample() call (instead of thread_per_chain for threading)

There is a vignette for cmdstanr showcasing that: Running Stan on the GPU with OpenCL • cmdstanr

For now you would need to get the brms generated Stan code and data and run it in cmdstanr the way its shown in this vignette.

I guess step one would be to get a few models that use GLMs that we could evaluate. Do you maybe have an brms example that uses a Stan GLM function that we could use to evaluate @paul.buerkner?

A sidenote is that the Stan-to-C++ codegen is not fully optimized yet in stanc3 for all cases. Still, it should benefit models where lpdfs/lpmfs dominate the execution time. Those that call GLMs for now. There are a lot of other optimizations in the backend that arent yet exposed fully.

We had a plan to optimize in this release cycle, but I got too busy with other stuff and then a bit burned out a month or so ago… I am a bit sad we didnt get that in as this will be very nice once we finish it with 60x+ speedups for logistic regression models like the redcard example.

I was meaning to open a brms issue once I had something more concrete, so we could discuss a plan for this, if this is something brms would want to support, but not there yet.

paul.buerkner · May 18, 2021, 8:53am

It would be great if we could support OpenCL in brms. And I don’t think its too complicated given that brms already supports threading and, as such, has all the code infrastructure in place to pass things to cmdstanr in the right places. Feel free to open an issue and I can implement quickly I hope.

brms always uses Stan GLM functions when permitted by the model. All all the common examples should work out of the box. We can also just simulate some data for this purpose.

rok_cesnovar · May 18, 2021, 11:26am

Also going to take this opportunity to say that I love that Baseball Prospectus is using Stan. Huge fan here :)

Topic		Replies	Views
Possible to quasi-automatically implement within-chain parallelization in cmdrstan with brms? Modeling cmdstanr , brms	3	549	August 13, 2022
OpenCL in BRMS with cmdstanr backend - making use of stan-math OpenCL functions Modeling performance , gpu	7	1186	January 5, 2021
Within-chain parallelization not working with cmdstanr on linux server General cmdstanr	14	1041	November 10, 2021
I cannot get opencl to work CmdStan compiler	2	1856	November 8, 2021
OpenCL & threading supported at the same time? Developers	9	1372	March 11, 2022

OpenCL and brms

Related topics