Parallel runs with RStan

I have a single machine with lots of cores, and want to simulate data → fit stan model repeatedly, ideally in parallel. Is that possible? How? I’ve looked and don’t see a clear answer; some things like this suggest it is not possible.

For clarity, I will refer to the whole simulate data ->fit stan ->analyze as a job, while within each job during the stan fit there are 1 or more chains.

Among my concerns:

  1. Compiled code. I would like to share it across the different parallel jobs (the stan model itself will be constant). There is an initial potential problem if 2 separate processes are both attempting to compile the same code: they may step on each other.
  2. Intermediate files. Any that get created, possibly by the code (I don’t know if there are any, though the thread cited in the first paragraph has one response indicating there are), would need to have unique names or be in unique locations. Both rstan and stan itself might create files.
  3. What is an appropriate balance between parallel chains within a job and parallel jobs? I notice substantial time going into compilation (admittedly, a one-time cost) and packaging and pulling apart the results. So Amdahl’s law suggests there are limits to devoting resources to the chains. Naively, if my goal is to assure maximum utilization of all processors, running a single chain would be best (or maybe 2 in succession to assess convergence), assuming many jobs, each with one chain.
  4. Other pitfalls, perhaps from sharing sockets or objects across threads. I’ve seen various mentions of such problems, though my bias would be to use processes, not threads, partly to avoid this kind of trouble.
  5. Properly randomiziing stan’s computations. I think naively generating a seed to pass to stan will suffice, though I’m aware that’s a bit loose.

A classic solution to 2 would be to run each job in a separate directory; unfortunately that would probably mean the code sharing of 1 would not be possible (on the bright side, the feared collision in 1 would also not be possible).

There is an awkward solution to the potential collision problem in 1: ensure the first job is run alone; then allow the rest to run in parallel.

On 3: parallel jobs and parallel chains are not the only potential source of parallelism. The libraries being called may themselves parallelize computations. I’m not sure if they do. From monitoring CPU use, it does look as if each chain currently uses only one core.

Finally, I’m not completely wed to using rstan, but that’s my preference.

Thanks for your thoughts.
Ross

Hi Ross,

I don’t know enough about parallel computing to be of any real help. I just want to register one issue which may pop up with rstan. In contrast to cmdstan(r), rstan keeps all samples in memory which may cause memory issues when running multiple "fit stan"s at the same time. Hopefully others can help you with your actual questions.

2 Likes

You can run as many instances of R as you have resources for. Just open up separate terminal windows and invoke R in each one. Each instance will use its own temporary directory and if you’ve successfully compiled a Stan model a copy of the executable will be stored in each.

Just don’t quit your sessions and store the workspace to the default .RData, which will overwrite each other.

1 Like

I’m not 100% certain, but I think there have been some possible problems calling same model to fit multiple times with RStan. (There was/is an issue in github).

But CmdStanR should not have these problems. Or you could create even CmdStan workflow (e.g. bash) and call it multiple times within the terminal.

1 Like

@Ross_Boylan

I have fit models many times using HTCondor, I think exactly like your work flow. Some tips I have:

  1. Think about how you want to run in parallel. I use HTCondor, but this is overkill for you if you only have one machine. My worry about a simple foreach loop in R is that you’d run out of memory with RStan. There are other HPC tools that might work you.
  2. Figure out your bottle necks. You might be better off simulating all of your datasets and then reading them in with the loop. This also lets you keep your simulated data. I agree with your choice to do this.
  3. Using RStan, you might be better off running one chain at a time and running many jobs in parallel depending upon your system (in contrast to multiple jobs each using 4x CPU, with one CPU per chain). This would cause you to have less idle CPU time because all CPUs would be used until the jobs are done
  4. Figure out how to avoid recompiling your model with RStan. Putting your model in a package is the easiest way to do this that I can think of to avoid locking your model file, but there are other methods as well.
  5. Look at using cmdstanr. See my recent post that links to the tutorial: Reduce_sum with occupancy model - Modeling - The Stan Forums (mc-stan.org). I found learning cmdstanr to be well worth my time.
1 Like

@ahartikainen I’ve had this problem before. One work around is to put the model into an R package. It also avoid recompiling and conflicting file names.

2 Likes