Hi there, I’m trying to have stan, using the pystan3 interface, to run as much steps as it is required, up to a maximum number of steps to avoid waiting too long, such that the convergence parameters (I’m thinking mostly about the r_hat and ess) show a stable value.
The reason I want to do this is because a fixed number of steps (both for warmup and to sample the posterior distribution) seem like it could lead to some models converging and others don’t. To fix this I could save the chains and resume from there manually with more steps until the convergence parameters showed stable values, however, this makes it harder to share my convergence criteria and would depend on each model I used, it would be much nicer to say I ran X chains, with Y initial conditions until r_hat and/or ess were below this threshold. The number of warmup steps should also be dynamic, and I’m quite unsure what criteria to use to figure out “the best” warmup value.
I’m not just here just to ask how would one achieve this in pystan (there is this blog post which did this but it uses pystan2 and pystan3 seems to have a much lower number of features) but also to see if this is common practice or it makes any sense.
TL;DR: Trying to figure out how (and if it makes sense) to code pystan to use a dynamic number of steps, depending on parameters such as the r_hat and ess.
this might be sensible in some contexts. Although usually warmup is the longer part of computation (and cannot be easily shortened, although some improvements to warmup are in the works), so the gains from terminating the sampling early are likely to be limited to substantially less than 50% of the computation time.
I am not a PyStan user, so tagging @ariddell for clarifications on PyStan3, but AFAIK PyStan3 is based on httpstan, which doesn’t appear to have the facility to either prematurely terminate a fit and return the samples so far or to continue from the last sample of a previous fit.
Best of luck with your model!
Thank you for your reply.
I thought about the fact that PyStan is based on httpstan, and by the name it suggested that it only had one way communication, which is fine considering that I could simply tell Stan to compute X steps, return them to me, and then I would decide if they were to classify as warmup, sampling or that it had finished already.
Unfortunately, this is just an idea, and that’s almost as far as my knowledge goes, so before trying to figure how to do that, and estimating how hard it would be to achieve such a behavior, I wanted to make sure it made sense, so hopefully my context is one of those cases where this makes sense.
This is possible by starting new chains with no adaptation, passing in the mass matrix from the initial run and last sample from the initial run as starting points.
See here for R code to achieve the resume part.
@martinmodrak I’ve just remembered something, perhaps you can help me, when the algorithm is warming up, is it the same process as when it is sampling the posterior distribution? Because I’ve assumed it so far and, if that’s not the case, then you couldn’t consider some samples as warmup because they weren’t submitted to same process.
@mike-lawrence Thanks for the link, will check it out.
I am far from expert on this, but the short answer is no. During warmup several quantities (mass matrix, stepsize, maybe others - not completely fresh on the details) are updated every once in a while. For the sampling iterations, all such quantities are kept constant (because the theoretical guarantees for MCMC AFAIK only hold when you keep those quantities fixed). So you can’t completely safely retrospectively decide to include some warmup samples into your posterior. Also, retrospectively declaring some posterior samples as warmup only helps if the chain has trouble reaching the stationary distribution, but if the problem is that the mass matrix or other stuff got badly adapted, it will not help you.
The initial conditions aren’t a problem at all for me, however, I’m having troubles when it comes to the mass matrix.
First of all I’m assuming that once I have the initial conditions and the mass matrix I can resume sampling immediately, without warming up the chains, if this is not the case, please do correct me if I’m wrong.
From the PyStan API reference it is mentioned that the keyword arguments are passed to the Python wrapper of
stan::services::sample::hmc_nuts_diag_e_adapt, where the C++ code is available here, and doesn’t feature an input named
inv_metric as your post indicates, but an
initial_inv_metric, which is probably the same thing.
Now my question lies on how do I get such a matrix, because from your code in R it seems that comes “out of the box” somehow, but I don’t understand R at all, but it looks as if the line
inv_metric = warmup$inv_metric(matrix=F) does the whole thing.
This can be done either with CmdStanPy (remember to increase decimal accuracy to 16 digits) or with Pystan 2. Current PyStan 3 does not support this feature, until it becomes an official feature.
Thank you for your reply, I will mark this as the solution since this questions is regarding PyStan3 in its current state.
May I also ask why does it seem that PyStan3 lacks so much features when compared to PyStan2?
Also, is there a plan to add this feature to PyStan3?
Many of the methods were brittle and unofficial → this lead to unmaintainable codebase. pystan3 tries to do things differently so that the things are better in the long run.
For this to be an official method, I think there should some discussion and decisions to be made in the main project (seed + rng, interface, etc)