Which Stan interface supports checkpointing?

scijens · May 5, 2021, 7:53am

Hi everyone,

I am currently running computationally demanding Stan models using CmdStan on a Linux server. Since the models take longer to run than the maximum time limit of a batch job, the server host recommended me to use checkpointing. I found the tool DMTCP, which supports checkpointing for R. Would this also work using RStan or CmdStanR?

In general, I would prefer to continue working with CmdStan, but there is no option for checkpointing, correct?

I would appreciate any suggestions and/or references to tutorials.

mike-lawrence · May 5, 2021, 12:00pm

Checkpointing can mean slightly different things:

Resuming sampling such that the entire state of the random number generator is reinstated, yielding numerically identical samples as if you hadn’t stopped.
Resuming sampling such that you don’t “lose work” in the sampler’s efforts to adapt, but don’t reinstate the full RNG state and thereby yield functionally equivalent but not-numerically-identical samples as if you hadn’t stopped.

I don’t believe that any of the interfaces achieve 1, and while you can do 2, it’s a bit of a manual process at present.

mitzimorris · May 5, 2021, 1:46pm

what @mike-lawrence said is correct - none of the interfaces support checkpointing.

first off - there’s always the “folk theorem” question - maybe there’s a problem with your model - see https://arxiv.org/pdf/2011.01808.pdf, section 5.1

is you model taking a long time during warmup? do you have confidence that the model has converged during warmup?

in theory, you can continue sampling by initializing the parameters, and setting the stepsize and inv_metric - interface CmdStanPy lets you access the model parameters as properly structured Python variables, which means that you could, in theory, get the parameter variables from your sample, CmdStanPy method stan_variables, take the last draw, dump them to a JSON dict and use that to initialize your parameters, then continue running post warmup, given specified step_size, metric, and init params.

as Mike said, not automatic.

Funko_Unko · May 5, 2021, 1:49pm

It’s slightly awkard but possible using (extending) CmdStanPy.

To make this semi-automatic the model needs to know how the parameters are called, although this might not actually be necessary.

mitzimorris · May 5, 2021, 2:03pm

not sure what you mean? if you supply a JSON file containing a dict over all Stan program variables as the initial parameter variables file, then the Stan I/O should do the right thing - supply the parameters, ignore the other variables.

Funko_Unko · May 5, 2021, 2:04pm

Yes, I wasn’t sure whether Stan would complain, hence

Edit:

That being said, if there are few parameters but many many transformed parameters and generated quantities things might get awkward.

mitzimorris · May 5, 2021, 2:14pm

agreed - currently very clunky.

Topic		Replies	Views
Current state of checkpointing in Stan Developers features	27	3107	November 18, 2020
Checkpointing CmdStan sampling Interfaces	24	2161	June 30, 2020
Chkptstanr v0.2.0-alpha: checkpoint brms and cmdstanr sampling Publicity techniques , cmdstanr , brms	6	374	February 29, 2024
Chkptstanr: checkpoint MCMC sampling in Stan General	10	1388	October 1, 2024
Checkpointing with CmdStanPy General	2	555	September 24, 2020

Which Stan interface supports checkpointing?

Related topics