Rstan - nested parallelization possible?


#1

Operating System: RHEL7
Interface Version: 2.15.1
Output of writeLines(readLines(file.path(Sys.getenv(“HOME”), “.R/Makevars”))):
Output of devtools::session_info("rstan”):

Is it possible to nest parallelization with rstan? So I have multiple things to run in parallel and I would like to start stan with 2 chains per job. The different things do run in parallel using a foreach loop which uses the doParallel backend. I would like to use 2 CPUs per stan run, but this always fails as it looks to me as if rstan interferes with the doParallel stuff.

Has anyone done this or is this not possible to do?

Thanks!
Sebastian


#2

You may want to ping @bgoodri or @jonah directly—they should know.


#3

@wds15 did you manage to work this out? I have the same kind of problem (I think). I have 5 models (4 chains in parallel) that I would like to execute in parallel with foreach %dopar%

I found this article interesting https://www.r-bloggers.com/can-you-nest-parallel-operations-in-r/ where appears that with parLapply (I don’t know whether rstan uses this) is not possible; but with foreach is (meaning that if rstan used foreach this could be accomplished for the more robust way to manage sockets)


#4

Hi,

I managed to achieve this by using mclapply from the parallel package. In short, if you want to nest two jobs with two chains in parallel, you would do something like this:

fits <- mclapply(1:N_data_sets, 
function(x)
  sampling(object = stan_model,
  data = data_set[[x]], 
  cores = 2, 
  chains = 2), 
  mc.cores = 2)

Where you set the number of chains and cores within the sampling function, and divide the jobs by setting mc.cores in the mclapply.

In Ubuntu, I needed to run it from the terminal. Using rstudio gives me an error.

Hope it helps!


#5

Hi!

I gave up on this and moved on with different parallelization schemes (forgot details). The issue is that the sampling method from rstan creates the cluster used for parallelization itself.

We should probably improve rstan here and make it query if there is a defined default cluster.

However, the best solution I am aware of is to run each chain separately and then merge the results using the tools in rstan… but I have opted against this hassle. The solution from @tiagocc is worth a try - I am just cautious about seed control and all that. I am not sure on that with the proposed solution (I am not saying its not working, but I just don’t know).

Best,
Sebastian


#6

I am not sure what parallelisation function rstan uses, but foreach with doParallel should solve the problem and make nested parallelisation much easier (link above).

It is often necessary to test N models at the same time. So this possibility would be incredibly useful.


#7

I haven’t done the decisions for rstan, but the problem with foreach & doParallel is that they add dependencies to the package. The parallel package which provides the mcapply used by rstan is a defacto R standard to my knowledge.

… but I agree that a foreach approach is a good thing (ideally with the doRNG). Still, most of the time I was able to rethink the problem and work around it in a meaningful way. Parallelization is good to avoid thinking about problems - which is a great way to save time.