I routinely fit many datasets to the same model so I built a helper function to do this easily and more efficiently (i.e. queueing all chains of all models together so that cores are not idle). In some cases I get a factor of 2 reduction in wall clock time compared to just running num_cores / num_chains
calls to sampling
in parallel.
It is not polished enough to consider adding to an existing package, but I hope it might be useful to somebody. Some docs is in the file itself, usage example follows. Should work on both Linux and Windows.
sampling_multi.R (5.1 KB)
library(rstan)
source("sampling_multi.R")
# Single model many datasets
model <- stan_model(model_code = "
data {
real x;
}
parameters {
real mu;
}
model {
mu ~ normal(0, 1);
x ~ normal(mu, 1);
}
")
data_list <- list(
list(x = 1),
list(x = 4),
list(x = 5)
)
fits <- sampling_multi(model, data_list)
fits[[1]]
fits[[3]]
# Many models many datasets
model2 <- stan_model(model_code = "
data {
real x;
}
parameters {
real mu;
}
model {
mu ~ normal(5, 2);
x ~ normal(mu, 2);
}
")
model_list <- list(model, model, model2)
fits2 <- sampling_multi(model_list, data_list)
fits2[[3]]
# Custom processing - pass functions to process (map) individual fits and to combine the results of chains
# There is built-in support to not store the fits in memory and write them to disk instead
dir.create("fits")
filenames <- sampling_multi(model, data_list,
map_fun = sampling_multi_store_file_generator(base_dir = "fits", base_name = "fit_"),
combine_fun = c
)
filenames[[1]]
#Read the 1st chain of the 1st result
readRDS(filenames[[1]][1])