Pass incomplete estimates as initial values for the next round of iterations

I am sending my Stan scripts to run on a supercomputer. However, I don’t know how to estimate how much time will the supercomputer require for computation (which is one of the parameters you need to pass to the supercomputer) and so I am often timed out. Therefore, I would like to make my code so that it saves the results of the estimation after each, say, 1000 iterations and then passes these yet incomplete estimates as initial values for the next iterations. That way, if I am timed out, I will at least have access to the calculations that were done until then and I will be able to start the estimation again from where it stopped.

However, I am trying to find a way to pass the results of the previous computations as initial values for the further computations. Simply passing the fit object to the init argument of the stan functions does not work, nor does passing the summary of the fit object. How can I ‘reshape’ the fit object so that its estimates can be used as initial values for further computations?

is this about warmup or sampling iterations? this comes up alot, and there’s probably a lot of good stuff in previous posts about what you can/can’t achieve.

for CmdStanPy, we’re working on operationalizing this, cf. this feature request

for now, this could be scripted as follows:

  1. run the sampler for 1000 warmup iterations and 1 sampling iteration.

  2. extract the following information from the resulting CmdStanMCMC object:

  • the current best estimate of the Stan program parameters. the method stan_variable(var=<param_name>) will return a numpy.ndarray over all draws where each element of the array has the correct Stan variable structure.

  • the stepsize and metric, available as CmdStanMCMC object properties step_size and metric

  • seed, chain_id for the PRNG used by the sampler. the Stan algorithms don’t record the PRNG state, which complicates the question of how best to resume sampling - perhaps using the same seed and chain_id*2 would be adequate

  1. restart the sample - specifying seed, chain_id, step_size, metric, and initial parameter values.

there’s a certain amount of data-munging required to extract/munge/marshall this information, hence the above-mentioned PR. also problematic - CmdStan’s stansummary function can’t be used here because it only analyzes sampling iterations, not warmup, but CmdStanPy will let you export the sample to Arviz, and then you could use their diagnostics. all of this needs more investigation.

CmdStanPy provides function write_stan_json which will create JSON input files required for the CmdStanModel sample method args metric and inits.

good luck!