Hi all,
I’m working with a larg dataset of approximately 6,000 participants who rated about 60 ordinal items and performed a task yielding a continuous predictor. I want to run a probit model using brm function with the following structure:
Rating ~ (1|item) + (1|subject) + z.continuous_predictor + gender + z.age
However, the model runs very slowly. I’m currently using cmdstanr and considering options to accelerate the process:
- Using the pathfinder algorithm instead of MCMC.
- Splitting the 6,000 subjects into groups of 100 and running the model using
brm_multiple
.
Are these options reasonable? Do you have any additional suggestions?
You can implement within-chain parallelization across subjects with reduce_sum()
:
https://mc-stan.org/docs/stan-users-guide/parallelization.html#reduce-sum
This will only take advantage of multiple CPU on the same node or machine. You will need map_rect()
if you want to use a cluster with multiple nodes:
https://mc-stan.org/docs/stan-users-guide/parallelization.html#map-rect
2 Likes
Thank you, I have 8 CPU cores, should I prefer this within chain parallelization over using 8 chains that run in parallel?
If yes, how should I choose the exact form of within chain parallelization?
This would add within chain parallelization to across chain parallelization according to a scheduler, though you might not see appreciable gains with 8 CPU – you’d just indicate how many chains you want to run in parallel as per usual. However, I don’t think I recommend breaking the data up into chunks, especially not as a substitute for running multiple chains the native way – it shares the same 8 CPU whether you run it in chunks or in one program.