Efficient fitting of multiple dataset/model combinations

I routinely fit many datasets to the same model so I built a helper function to do this easily and more efficiently (i.e. queueing all chains of all models together so that cores are not idle). In some cases I get a factor of 2 reduction in wall clock time compared to just running num_cores / num_chains calls to sampling in parallel.

It is not polished enough to consider adding to an existing package, but I hope it might be useful to somebody. Some docs is in the file itself, usage example follows. Should work on both Linux and Windows.

sampling_multi.R (5.1 KB)

library(rstan)
source("sampling_multi.R")

# Single model many datasets

model <- stan_model(model_code = "
data {
  real x;
}
parameters {
  real mu;
}
model {
  mu ~ normal(0, 1);
  x ~ normal(mu, 1);
}
")

data_list <- list(
  list(x = 1),
  list(x = 4),
  list(x = 5)
)

fits <- sampling_multi(model, data_list)
fits[[1]]
fits[[3]]

# Many models many datasets

model2 <- stan_model(model_code = "
data {
  real x;
}
parameters {
  real mu;
}
model {
  mu ~ normal(5, 2);
  x ~ normal(mu, 2);
}
")

model_list <- list(model, model, model2)

fits2 <- sampling_multi(model_list, data_list)

fits2[[3]]

# Custom processing - pass functions to process (map) individual fits and to combine the results of chains
# There is built-in support to not store the fits in memory and write them to disk instead
dir.create("fits")
filenames <- sampling_multi(model, data_list, 
                            map_fun = sampling_multi_store_file_generator(base_dir = "fits", base_name = "fit_"),
                            combine_fun = c
                            )

filenames[[1]]

#Read the 1st chain of the 1st result
readRDS(filenames[[1]][1])
3 Likes