Trying within-chain parallelization with reduce_sum increases runtime a lot

Thanks for the suggestion! I looked at the threading vignette for BRMS and it looks like it will automatically parallelize log likelihood calculation. Since the likelihood for my model is lognormal, I assume it is not very expensive (but I will confirm that with profiling). In that case, I would not expect BRMS’s approach to give me much of a speedup. That’s what the ‘simple reduce_sum’ model I posted is meant to do, but it resulted in a substantial slowdown.

The threading vignette also notes the need to play around with grainsize, so that’s probably something I need to do.

I can also try to implement some of the changes over in this thread about
How to most efficiently reduce_sum in a hierarchical logistic model
.

Thanks again!