Fitting multiple models with multithreading

I’m routinely fitting between 5 and 50 relatively simple models with brms using the cmdstanr backend for multithreading. Each model (and its corresponding data) is independent of the others: different data, specifications, and outcomes. Sampling typically requires about 30 minutes per model. I haven’t run into any convergence issues and I have enough RAM that memory shouldn’t be an issue.

To this point, I’ve used a for() loop to fit all of these models. The loop iterates through 3 length-K list to fit a different model to a different dataframe and store the result each iteration: (1) a list of data.frames, (2) a list of different brms calls, and (3) an “empty” list to hold resulting model objects. Using a loop ensures that I am only fitting one model at a time, sequentially.

While my motivation for using a loop was to avoid issues with nested parallelization - trying to parallelize code that itself employs multithreading - is this concern warranted? Is there any resource for implementing alternatives?

For example, instead of looping, I could use any of the parallelized apply functions (e.g. parLapply or future_lapply) to fit each model in parallel using my lists. Alternatively, I could simply swap out the loop for the vectorized-but-not-parallelized lapply(). The appeal of doing so is (a) reducing the time needed to fit all of the models and (b) “simpler” code without indexing etc.

Maybe you can share your current implementation (for loop and laaply?), so that others can detect any issues and leave comments on them.