I have a single machine with lots of cores, and want to simulate data → fit stan model repeatedly, ideally in parallel. Is that possible? How? I’ve looked and don’t see a clear answer; some things like this suggest it is not possible.
For clarity, I will refer to the whole simulate data ->fit stan ->analyze as a job, while within each job during the stan fit there are 1 or more chains.
Among my concerns:
- Compiled code. I would like to share it across the different parallel jobs (the stan model itself will be constant). There is an initial potential problem if 2 separate processes are both attempting to compile the same code: they may step on each other.
- Intermediate files. Any that get created, possibly by the code (I don’t know if there are any, though the thread cited in the first paragraph has one response indicating there are), would need to have unique names or be in unique locations. Both
stanitself might create files.
- What is an appropriate balance between parallel chains within a job and parallel jobs? I notice substantial time going into compilation (admittedly, a one-time cost) and packaging and pulling apart the results. So Amdahl’s law suggests there are limits to devoting resources to the chains. Naively, if my goal is to assure maximum utilization of all processors, running a single chain would be best (or maybe 2 in succession to assess convergence), assuming many jobs, each with one chain.
- Other pitfalls, perhaps from sharing sockets or objects across threads. I’ve seen various mentions of such problems, though my bias would be to use processes, not threads, partly to avoid this kind of trouble.
- Properly randomiziing stan’s computations. I think naively generating a seed to pass to stan will suffice, though I’m aware that’s a bit loose.
A classic solution to 2 would be to run each job in a separate directory; unfortunately that would probably mean the code sharing of 1 would not be possible (on the bright side, the feared collision in 1 would also not be possible).
There is an awkward solution to the potential collision problem in 1: ensure the first job is run alone; then allow the rest to run in parallel.
On 3: parallel jobs and parallel chains are not the only potential source of parallelism. The libraries being called may themselves parallelize computations. I’m not sure if they do. From monitoring CPU use, it does look as if each chain currently uses only one core.
Finally, I’m not completely wed to using
rstan, but that’s my preference.
Thanks for your thoughts.