Parallel the same model fitting for differen data

Hi Stan community,

I am trying to fit the same model (using cmdstanr) for different data inputs data_list_all[[idx]]. In this case, I thought I could use parallel for loop to save some time. In the following code, I tried foreach and %dopar% from library(parallel) library(foreach) library(doParallel). It works when I decrease the data length of each data_list_all[[idx]] in each loop (for testing), but when using full length data, the fitting results of a few data_list_all[[idx]] were not properly saved, because cmdstan output files were not found in the temp folder. If I just run sequentially with full data size, everyone fitted very well.

Do you know what’s going on here? Maybe we have another better choice? My gut feeling is that parallel chains in Stan may not fully completable with foreach loop such that one finished chain for data_list_all[[2]] was overwritten by a chain for data_list_all[[6]], before other chains complete for data_list_all[[2]].

Another way I am thinking is to index parameters in the model and feed all data with the same indexing, as long as I don’t pool parameters over the data set, it should be identical to the for loop solution. So you think in this case I can tell Stan to use 4 cores per data_list_all[[idx]]?

Thank you very much :)

  fit_list_all <- foreach(
    idx = fit_idx
  ) %dopar% {
    mod$sample(
      data = data_list_all[[idx]], iter_warmup = 1000, iter_sampling = 1000,
      chains = 4, parallel_chains = 4, show_messages = F
    )
  }

You may be right, despite R randomly generating file names. You could try using the output_dir arguments or output_basename functions in the method sample() on a cmdstan_model in cmdstanr.

I’m pinging @jgabry, who should know the answer here.

Also, you want to be careful to not spawn more jobs than you have cores. Even then, I find on my rather beefy Xeon-based iMac Pro that it can’t run 8 chains in parallel nearly as fast as 1 sequentially. So you might not be getting a lot of gains from parallelization if you’re getting close to or exceeding the number of cores you have.

1 Like

I will try to specify cmdstan output folder. Hopefully @jonah has better solutions.

You are absolutely right, this already happened to me. I do the following to avoid this issue.

parallel::detectCores()
n.cores <- parallel::detectCores() - 2