Running brms models on a computing cluster

Dear Stan community,

Hope this message finds you well!

I was wondering if someone has successfully implemented brms models on a computing cluster at a university. It seems that this kind of inquiry is not so frequent on this forum. But given that brms models are very costly in terms of computing resources and it would be great to run multiple models in parallel, I suspect that there should be a solution re: computer clusters.

If you happen to know this, would you mind sharing a couple of tutorials? Thanks a lot!

Best,
Claire

1 Like

In my (admittedly limited) experience, cluster-to-cluster variation would make it very difficult to write tutorial with enough detail to actually run a job -AND- have it be useful outside of a specific university’s cluster. The main hurdles for me center on setting up the R environment (incl. where and how to install packages like brms or cmdstanr) and properly requesting resources via the job scheduler (varies depending on which your cluster uses: LSF, Slurm, PBS, etc).

Many university clusters have support staff who can be immensely helpful for working through any idiosyncrasies of setting up your R/Stan environment. There are good tutorials and example scripts out there for submitting an R script for many of the common job schedulers. For example, my university uses LSF and has prepared a set of publicly available written and video tutorials for running (parallel) R scripts geared towards beginners. To get this going with brms, you’ll mainly need to ensure that you’re matching the number of cores and threads to what you requested via the job scheduler.

For any remaining idiosyncrasies, it’s helpful to be ready to explain a general overview of what should happen to the support staff (i.e., brms makes code and data in R, sends them to Stan for translation, compilation with C++, and sampling). A cluster I used a few years ago kept failing when brms tried to compile the model. Explaining helped pinpoint the problem. For that particular cluster, the C++ compiler was deliberately not available on the compute nodes, meaning that I had to figure out how to go about Separating compilation & sampling with brms on cluster (kudos to the helpful folks here on the forum).

I hope I’m wrong and that others do know of good tutorials. If I’m being overly pessimistic, I wonder if folks with expertise would be interested in preparing tutorials to host on the Stan website tutorials. I’d love to see the same couple of models run with brms, cmdstan, and a Python interface crossed with the most common job schedulers.

1 Like

I very much recommend you to get clustermq working in your setup. That is not entirley trivial, but worthwhile. Once you have that you may benefit from these instructions:

1 Like

Running models in parallel on a cluster seems trivial to me, unless I misunderstanding something? As @wpetry points out, it depends on what scheduling system / options your organization provides. Generally though, with 4 chains / 4 cores, you could run many models in parallel on even a single node w/ 64 cores with fairly simple R / python scripts.

What issues are you running into?

As the others have said, it will depend a lot on the specifics of your cluster and the policies the administration has. I’ve found that a docker-based workflow has been helpful in my stan/brms HPC use, as that tends to simplify a lot of the installation and compiler issues.

1 Like