Separating compilation & sampling with brms on cluster

I want to fit a brms model on a Linux cluster. The system I am working on only has a C compiler on the head node, not on compute nodes (this seems non-negotiable). I suspect that separating compilation (on head) and sampling (on compute) ought to be possible based on a previous RStan post on this forum. I think I need to compile the model and write out an intermediate file on the head node, then load the intermediate file and sample on the compute node. Furthermore, I suspect this “intermediate file” is the dynamic shared object (DSO) referred to in help("brm").

How do I compile the model without starting sampling, write out the DSO, then load it back into an R session on another node?

Attempts towards an answer:
Here’s a sample model based on the brms docs that I am using as a testbed:

# Normal model with heterogeneous variances
data_het <- data.frame(
  y = c(rnorm(500), rnorm(500, 1, 2), rnorm(500, 50, 50)),
  x = factor(rep(c("a", "b", "c"), each = 500))
)

# Fit model
fit <- brm(bf(y ~ x, quantile = 0.25), data = data_het,
           family = asym_laplace(), chains = 8)

I’ve tried to prevent sampling by setting iter = 0, warmup = 0. Doing so throws an error, but does seem to have compiled without sampling.

> fit
 Family: asym_laplace 
  Links: mu = identity; sigma = identity; quantile = identity 
Formula: y ~ x 
         quantile = 0.25
   Data: data_het (Number of observations: 1500) 

The model does not contain posterior samples. 
  • Operating System: Springdale Linux 7.6 (Verona)
  • brms Version: 2.8.0

Hi wpetry

I haven’t tried brms package before.

But, I think you do not need to include your data and the fitting statement in your compilation file. The compilation file should only include the specification of your model.

Then, use another R file that first has a statement to call “load” your compiled file and then has your data and the fitting statement.

Hope this helps.

First prepare the model as follows:

fit_empty <- brm(..., chains = 0)

Then, you save fit_empty and load it on the compute nodes. There, you call

fit <- update(fit_empty, recompile = FALSE, ...)

where ... contains sampling arguments such as chains etc.

6 Likes

Thanks @paul.buerkner, this worked beautifully. In retrospect, chains = 0 makes more sense to avoid sampling than iter = 0.

I wonder if adding something to the reference manual entry for brm would be useful for others. Perhaps under the chains argument adding something like: “Setting chains = 0 will compile the model without sampling, which may be useful on systems where compilation and sampling must be separated” and adding under iter “Must be a positive integer. See chains for compiling model without sampling.”

1 Like