Cross-cluster contamination in parallel cluster?

jpsnijder · June 22, 2024, 2:24pm

Hi all,

I am fitting different data to a stan model (using cmdstanR) in parallel on my own PC (using parallel::makeCluster). After much debugging and trying different scenarios, it turns out that when I run the models in parallel there seems to be cross-cluster contamination but only during the sampler phase that sets the initial values, stepsize, and inverse mass matrix. This gives the following error:

Error during model fitting: ‘init’ has the wrong length. See documentation of ‘init’ argument.

When I open 5 different instances of R/Rstudio and run them in parallel manually without the clusters, the same error occurs. However, when I run the models sequentially by adding a sleep.system function with the locally generated clusters, it runs fine.

Because the code itself is quite long (and a reprex for this is a lot of work), I’ll first share some code snippets and potentially relevant information:
The models are pre-compiled. I am using chkptstanr (version .20) to call cmdstanr, however due to testing I don’t think the package is the issue (but perhaps there is some interaction). all output paths are unique, even the base chain names (output_basename) are unique. Each run uses a different seed.

Does anyone have anyone ideas here?

# Packages
library(glue)
library(doSNOW)
library(foreach)
library(chkptstanr)
library(MASS)
library(cmdstanr)
library(dplyr)
library(stringr)
library(brms)

chkpt_stan(model_code = stan_model,
                   data = stan_data,
                   iter_adaptation = 150,
                   iter_warmup = warmup_iters,
                   iter_sampling = sampling_iters,
                   iter_per_chkpt = chkpt_iters,
                   parallel_chains = 4,
                   threads_per = 1,
                   chkpt_progress = TRUE,
                   control = NULL,
                   seed = seed,
                   stop_after = dynamic_stop,
                   reset = FALSE,
                   path = path_,
                   output_basename = paste0("chain_", model, ".", 
                                            dataset, ".",
                                            run, "."))

## set up parallel backend ---------------------------------------------------

if (hyper_parallel) {
  cluster = parallel::makeCluster(
    models_in_parallel,
    outfile = glue("output/consoleOut.txt")
  )
  doSNOW::registerDoSNOW(cluster)
}

# Nested approach

foreach(model_id = models, 
        .packages = c('chkptstanr', 'MASS', 'cmdstanr', 'dplyr', 'stringr', 'brms'), 
        .errorhandling = "stop") %:%
  foreach(dataset_id = 1:N_datasets, 
          .packages = c('chkptstanr', 'MASS', 'cmdstanr', 'dplyr', 'stringr', 'brms'), 
          .errorhandling = "stop") %:%
  foreach(run_id = 1:N_runs,
          .packages = c('chkptstanr', 'MASS', 'cmdstanr', 'dplyr', 'stringr', 'brms'),
          .errorhandling = "stop") %dopar% {
            
            run_analysis(model = model_id, dataset = dataset_id, run = run_id)
            
          } # foreach close

Please provide this additional information in addition to your question:

Windows 11
CmdStan Version: 2.35.0
CmdStanr Version: 0.8.1.9000

ahartikainen · June 23, 2024, 7:05am

Interesting finding. I wonder if some cout calls in source would help us figure out what goes in to the init.

jpsnijder · June 23, 2024, 2:31pm

Would you be so kind to clarify a bit more?
“The source” equals to the C++ file after compilation of the model right?
And do you have any suggestion what to cout?
I have never done this :)

ahartikainen · June 24, 2024, 9:52am

Oh yes, this is probably something devs need test. Adding cout or logging calls to source is not straight forward.

Topic		Replies	Views
Parallel the same model fitting for differen data CmdStan techniques	8	1289	November 22, 2023
CmdstanR models fail on HPC cluster when running concurrently on the same node Interfaces	1	380	July 19, 2023
Weird inconsistent behavior between OSX and linux cluster on same Stan model Modeling	2	420	April 15, 2021
Running cmdstanr in parallel on computing cluster General	6	901	December 9, 2022
Problem "running the model cmdstanr with simulated data in R" CmdStan techniques , fitting-issues , specification	14	1719	February 16, 2022

Cross-cluster contamination in parallel cluster?

Related topics