The only thing needed to run a model with OpenCL in cmdstanr is:
cpp_options (the same way we now set
STAN_THREADS for threading)
- select the device with
opencl_ids in the
$sample() call (instead of thread_per_chain for threading)
There is a vignette for cmdstanr showcasing that: Running Stan on the GPU with OpenCL • cmdstanr
For now you would need to get the brms generated Stan code and data and run it in cmdstanr the way its shown in this vignette.
I guess step one would be to get a few models that use GLMs that we could evaluate. Do you maybe have an brms example that uses a Stan GLM function that we could use to evaluate @paul.buerkner?
A sidenote is that the Stan-to-C++ codegen is not fully optimized yet in stanc3 for all cases. Still, it should benefit models where lpdfs/lpmfs dominate the execution time. Those that call GLMs for now. There are a lot of other optimizations in the backend that arent yet exposed fully.
We had a plan to optimize in this release cycle, but I got too busy with other stuff and then a bit burned out a month or so ago… I am a bit sad we didnt get that in as this will be very nice once we finish it with 60x+ speedups for logistic regression models like the redcard example.
I was meaning to open a brms issue once I had something more concrete, so we could discuss a plan for this, if this is something brms would want to support, but not there yet.