Hello, This seems like it should be simple and is related to this question. I am trying to run a sensitivity analysis for several parameters using rstan
on an HPC cluster. I am generating multiple simulated data sets and apply four stan
models to each dataset using:
map(seq_along(list.of.simdata), function(x) sampling(model1,
data = list.of.simdata[[x]],
iter = n_iter, warmup=n_wmp, chains=n_c, seed = 8029,
control=list(adapt_delta = 0.99, max_treedepth = 16)))
The result is a list of multiple sstanfit
objects (length = length of list.of.simdata
). I have been trying to use saveRDS
to save each of these lists, but the resulting file is usually more than 6GB and eventually overruns my available disk space. I’ve tried adding the compress="xz"
command to saveRDS
(because this seems to achieve the greatest amount of compression).
Unfortunately, that dramatically increases the length of time to save the file. I eventually need to compare the posterior draws for about 15 parameters to the originally simulated values so all I really need is the draws along with any of the sampler parameters to ensure that I didn’t get any warnings (divergences, BFMI, etc). I’m wondering what the best way is to ensure that I retain the ability to access samples, evaluate sampler performance, and run diagnostics (e.g., traceplots) without exhausting disk space and while retaining the ability to open files on my local machine in a new R
session…
Is it better to just extract the draws and sampler info and save them as their own objects? What are the drawbacks to doing that (rather than finding an efficient way to compress the list of stanfit
objects)? Any pointers would be most appreciated.