Within-thread parallelization works extremely well in brms, including use of the threading function. It appears to be using the CPU and what I presume is TBB,.
I would like to be able to direct brms to compile an OpenCL cmdstan model instead. Assuming it is not built in somehow, can we force that to occur by sending cmdstanr commands along with the brm function for compiling (e.g., stan_opencl = TRUE), or otherwise modifying the compiled file?
For now you would need to get the brms generated Stan code and data and run it in cmdstanr the way its shown in this vignette.
I guess step one would be to get a few models that use GLMs that we could evaluate. Do you maybe have an brms example that uses a Stan GLM function that we could use to evaluate @paul.buerkner?
A sidenote is that the Stan-to-C++ codegen is not fully optimized yet in stanc3 for all cases. Still, it should benefit models where lpdfs/lpmfs dominate the execution time. Those that call GLMs for now. There are a lot of other optimizations in the backend that arent yet exposed fully.
We had a plan to optimize in this release cycle, but I got too busy with other stuff and then a bit burned out a month or so ago… I am a bit sad we didnt get that in as this will be very nice once we finish it with 60x+ speedups for logistic regression models like the redcard example.
I was meaning to open a brms issue once I had something more concrete, so we could discuss a plan for this, if this is something brms would want to support, but not there yet.
It would be great if we could support OpenCL in brms. And I don’t think its too complicated given that brms already supports threading and, as such, has all the code infrastructure in place to pass things to cmdstanr in the right places. Feel free to open an issue and I can implement quickly I hope.
brms always uses Stan GLM functions when permitted by the model. All all the common examples should work out of the box. We can also just simulate some data for this purpose.