Brms on supercluster (MPI)

  • Operating System: “Red Hat Enterprise Linux Server 7.2 (Maipo)”
  • brms Version: 2.9.0
  • rstan version: 2.19.2

Recently I have been working on what are essentially generalized additive models (GAMs), initially using mgcv. However, for both computational and principled reasons I would like to move to a Bayesian framework. I am still pretty new to both GAMs and Bayesian inference, so specifying a GAM directly in rstan is beyond me, at least for now. I have therefore been using the amazing brms package to do things like this (reduced version):

Family: exponential
Links: mu = log
Formula: eeg06 ~ s(relright, k = 20) + s(relleft, k = 20)
Data: subset(eeg, select = c(deps[d], "relright", "relle (Number of observations: 5266)

However, it has quickly become clear that running these analyses on my office pc is not feasible: even a considerably reduced model takes about 4 days to run. I do have access to a supercluster, but my initial attempts don’t do much better.

From what I can find, the newest version of RStan supports MPI parallelization. My question is: does brms make use of this automagically (i.e., I cannot achieve any further performance gains)? And if not, can I set up brms for this? Or will I need to invoke rstan directly/‘manually’ to use the MPI features?

I would be most grateful for any guidance you can provide on this.

Joost Meekes

brm() has a future argument, that if set to TRUE will use the future package for parallelization. The future package appears to have support for MPI (makeClusterMPI), so that should provide an easy way to achieve this, though I’ve never tried.

2 Likes

Anecdotally, I’ve noticed much faster performance with GAMs in brms by specifying a smaller k. Unlike mgcv which “finds” a reasonable k, I don’t believe we expect brms to do the same. So it’s somewhat akin to passing a lot of linear predictors (in your case at least 40).

So one approach, though I can’t speak to it’s “principledness” might be to use mgcv to recommend a k and then use brms and LOO to compare models bracketing that k.

Not sure what your data are but 20 knots seems like a very tortuous relationship!

Thanks a lot for the helpful suggestion! I will try that, since I am having some trouble with the ‘cores’ argument, which is supposed to do the same.

However, both the ‘cores’ and the ‘future’ argument seem to concern running the chains in parallel, whereas if I understand it correctly, the MPI implementation in the newest versions of Stan perform the gradient evaluation in parallel (which seems to be the most computationally expensive step?).

That’s a good point, I have to admit that I had assumed this would work the same, since brms just uses the mgcv functions to set up the splines. Thinking (and reading) about it in more detail, the effective number of degrees of freedom is indeed only determined during fitting, based on penalization. On the other hand, the help for brms (specifically, for brms::set_prior) states that they are implemented ‘as explained in gamm’, which in turn says that this is based on a ‘Bayesian model of spline smoothing’ (which suggests to me that a suitable prior should induce suitable ‘penalization’, to mix terminology from two different statistical approaches. In any case, you are certainly right that it’s worth checking, and I’ve seen at least one model fit that would be congruent with your suggestion.

Regarding my choice of k: while I am indeed relying on the general advice in the mgcv documentation that it’s better to choose k too high than too low, what I am modeling are EEG responses to trials which contain multiple events. Since the EEG response to even a single event may show several ups and downs, I think the ‘optimal’ k might well be relatively high.

That is an interesting idea. At the present stage of this project I am mainly trying to figure out whether it is at all feasible to fit these kinds of models, so I am not too worried about ‘principledness’ :-)

You have to reformulate your model to use map_rect for parralelizing the gradient evaluation within stan. Otherwise brms/rstan/pystan run multiple chains in parallel

The below may be helpful if you would like to use threading within the model

1 Like

Thanks a lot! So that confirms my suspicion that brms does not use parallelize the gradient evaluation - nor even RStan, if I understand correctly. On the plus side, it suggests that I can still reap the huge benefits that parallelizing gradient evaluation should bring :-) Looks like I have some learning to do…