I have fit models many times using HTCondor, I think exactly like your work flow. Some tips I have:
- Think about how you want to run in parallel. I use HTCondor, but this is overkill for you if you only have one machine. My worry about a simple
foreach
loop in R is that you’d run out of memory withRStan
. There are other HPC tools that might work you. - Figure out your bottle necks. You might be better off simulating all of your datasets and then reading them in with the loop. This also lets you keep your simulated data. I agree with your choice to do this.
- Using RStan, you might be better off running one chain at a time and running many jobs in parallel depending upon your system (in contrast to multiple jobs each using 4x CPU, with one CPU per chain). This would cause you to have less idle CPU time because all CPUs would be used until the jobs are done
- Figure out how to avoid recompiling your model with RStan. Putting your model in a package is the easiest way to do this that I can think of to avoid locking your model file, but there are other methods as well.
- Look at using
cmdstanr
. See my recent post that links to the tutorial: Reduce_sum with occupancy model - Modeling - The Stan Forums (mc-stan.org). I found learningcmdstanr
to be well worth my time.