I don’t think brms allows for GPU acceleration, but there are really only a couple scenarios where you can use a GPU with Stan. If your model does a cholesky decomposition, there’s some speedup available, and indeed hierarchical models can involve cholesky decompositions, but unless you have a huge number of predictors in your design matrix, I don’t think it’s worth exploring. Given it sounds like you have very tall data (few predictor columns, lots of observation rows), reduce_sum
is your best bet for speeding things up.
EDIT: oops, I might be wrong on my pessimism regarding GPUs for tall data; I forgot that the GPU crew added support for accelerating GLMs. You might look into that, though I suspect that it might be both easier and more performant (since the GPUs seem to max out at x10 speedups according to the GPU-Stan paper) to just use reduce_sum
instead.
Edit2: FYI I made a post to check my intuition, and the GPU folks are a little less pessimistic than I am