I’m routinely fitting between 5 and 50 relatively simple models with
brms using the
cmdstanr backend for multithreading. Each model (and its corresponding data) is independent of the others: different data, specifications, and outcomes. Sampling typically requires about 30 minutes per model. I haven’t run into any convergence issues and I have enough RAM that memory shouldn’t be an issue.
To this point, I’ve used a
for() loop to fit all of these models. The loop iterates through 3 length-K list to fit a different model to a different dataframe and store the result each iteration: (1) a list of data.frames, (2) a list of different
brms calls, and (3) an “empty” list to hold resulting model objects. Using a loop ensures that I am only fitting one model at a time, sequentially.
While my motivation for using a loop was to avoid issues with nested parallelization - trying to parallelize code that itself employs multithreading - is this concern warranted? Is there any resource for implementing alternatives?
For example, instead of looping, I could use any of the parallelized
apply functions (e.g.
future_lapply) to fit each model in parallel using my lists. Alternatively, I could simply swap out the loop for the vectorized-but-not-parallelized
lapply(). The appeal of doing so is (a) reducing the time needed to fit all of the models and (b) “simpler” code without indexing etc.