Detecting convergence issues during estimation?

I’ll preface this question by admitting that my understanding of MCMC and NUTS is pretty limited and apologizing if this is a simple or silly question. But I was wondering whether it would be possible to abort the MCMC as soon as a convergence issue (e.g., divergent transition or hitting the maximum tree depth) is encountered. I’m not sure if these things are only detectable from examining the whole posterior space in retrospect, but if they can be detected during estimation, having the option to abort the MCMC on the spot would save a lot of time in completing models that will ultimately need to be reparameterized or rerun with a different configuration. I have some models that require days to run and it is painful when they fail after all that.

3 Likes

@bbbales2

Hmm, that does seem like a useful workflow idea to have these things reported.

Divergences happen regularly during warmup (the timestep adaptation is kindof crazy) though, and treedepth problems too before things are adapted.

Are you hitting this mostly with new models and then having to work out the kinks to make them fast?

Or is it that you have a model and you’re applying new data and getting surprising failures?

1 Like

Most of the time, it will be a new model and any convergence issues will be resolved by simply increasing adapt_delta, max_treedepth, or iterations.

Occasionally, there will be a bigger issue like I accidentally set crazy priors, but these models seem to often fail faster.

Maybe the easiest way to monitor this stuff then is running models in cmdstan.

If you’ve got the model ready for a 24 hour run, or whatever, dump the data from rstan (stan_rdump(names(mydata), "filename.dat", env = list2env(mydata))) and then run the model in cmdstan with save_warmup = 1:

./mymodel sample save_warmup=1

By default the output is output.csv. You can open that with file with Excel/Libreoffice/whatever and thumb through it while things sample. You can watch for big treedepths and divergences there.

bin/stansummary works for partial output, but only if you’ve made it through warmup. You can also read those csvs into R and make diagnostic plots, etc.

I guess it’s all not as automatic as it could be, but that’s vaguely how to do this sorta monitoring.

I would double down on this as being a useful feature for future development, if at all possible. Ideally, users would receive feedback on detected problems inline with the usual process feedback in R. Notional example:

Chain 1: Iteration: 1001/2000 [ 50%] (Sampling)
** Warning: divergent transition after warmup **
** Warning: divergent transition after warmup **
Chain 1: Iteration 1200/2000 [ 60%] (Sampling)

etc…

3 Likes

@betanalpha, thoughts?

This would work well for me! I already compulsively check the feedback…

The internal algorithm code returns full information at each iteration and it’s the interfaces’ responsibility to store/display any such information. brms just calls RStan so this is an RStan concern, not a brms concern.

RStan and PyStan store intermediate results in memory which are not accessibly until after the chains are run, but adding the diagnostic_file option streams information about the unconstrained states, including diagnostic information, to a CSV-like text file that can be analyzed mid-run. In CmdStan and its variants setting save_warmup will allow you analyze the warmup iterations as well, although that’s not an option in RStan and PyStan.

What one has to remember, however, is that the proper interpretation of diagnostics are expectation values. This means that they are best understood in the context of entire Markov chains not point wise. Divergences, for example, tell you something about the local geometry of your posterior density function but the rate of divergences tells you something about the extent of the pathological geometry relative to the entire geometry and hence how high aggressively you might have to set adapt_delta. Same with the treedepth warnings and setting max_treedepth.

Aborting too early can lead to suboptimal reconfigurations and additional failing runs and so there is a balance. Of course this all depends on how carefully one is reacting to the diagnostic information. If one is treating divergences as binary – do divergent transitions exist or not – then the additional information gained from running reasonably long chains may not be useful.

1 Like

Continuing the discussion from Detecting convergence issues during estimation?:

This looks relevant: chkptstanr • chkptstanr

2 Likes