I’m routinely fitting between 5 and 50 relatively simple models with brms
using the cmdstanr
backend for multithreading. Each model (and its corresponding data) is independent of the others: different data, specifications, and outcomes. Sampling typically requires about 30 minutes per model. I haven’t run into any convergence issues and I have enough RAM that memory shouldn’t be an issue.
To this point, I’ve used a for()
loop to fit all of these models. The loop iterates through 3 length-K list to fit a different model to a different dataframe and store the result each iteration: (1) a list of data.frames, (2) a list of different brms
calls, and (3) an “empty” list to hold resulting model objects. Using a loop ensures that I am only fitting one model at a time, sequentially.
While my motivation for using a loop was to avoid issues with nested parallelization - trying to parallelize code that itself employs multithreading - is this concern warranted? Is there any resource for implementing alternatives?
For example, instead of looping, I could use any of the parallelized apply
functions (e.g. parLapply
or future_lapply
) to fit each model in parallel using my lists. Alternatively, I could simply swap out the loop for the vectorized-but-not-parallelized lapply()
. The appeal of doing so is (a) reducing the time needed to fit all of the models and (b) “simpler” code without indexing etc.