Stan - Memory usage when running on high performance clusters in parallel

Hi all,

I am often running my Stan estimations on my university’s high performance cluster. I am using it via the rstan package for R.

The usual setup for Stan with e.g. 4 chains would allow you to parallelize the estimation in 4 chains by specifying the number of cores that you wish to use as an argument in the stan function in rstan. This will make the chains run in parallel instead of sequentially.

The problem arises when I try to do it on the cluster. The cluster on my university uses Slurm for management of jobs sent to it. If I specify that I want 4 chains and 4 cores and I submit it as a single job on the cluster, the problem is that the memory which is reserved for specific chain will not free up for other users after that chain has finished, but, since it was all submitted as a single job, other users will have to wait until all my 4 chains have finished and then the total memory reserved for that job will become available for other users.

A workaround that I had to do due to very persistent nagging from the maintainers of the cluster was to submit each chain as an individual job to the cluster. So if I have 4 chains, I would copy my R script 4 times and just change the chain_id argument in the stan function. However, each script would only run one chain. To avoid having to submit every job manually, I submit them all at once by using job arrays in Slurm.

However, this becomes quite horrible to use if you have a simulation study with very detailed and delicate folder structure with many subfolders and subsubfolders which indicate different conditions used in the simulation study.

Also after the estimation is complete, I have to run another script to merge the files, which can also get quite complicated if you have a sophisticated folder structure.

Therefore, I was wondering if there is any way to make Stan free up the memory reserved for a single chain after that specific chain has finished so that I can just submit everything to the cluster normally as a single job instead of having to meddle with segmenting estimation into chains in this way.


1 Like