Any intuitions on evaluating accept_stat & stepsize?

mike-lawrence · November 10, 2020, 11:01pm

Poor behavior in the tails is the kind of pathology that can be uncovered by running only a few warmup iterations. By looking at the acceptance probabilities and step sizes of the first few iterations provides an idea of how bad the problem is and whether it must be addressed with modeling efforts such as tighter priors or reparameterizations.

Do we have any intuitions on useful thresholds for detecting things going awry with either metric? (I’m working on some during-warmpup/sampling diagnostic ideas)

bbbales2 · November 11, 2020, 1:25pm

In some order,

Divergences
Max treedepth exceeded
Low Neff
Bad Rhat (multiple chains makes this work really well imo)
Slow chains/high treedepths
Some chains fast and some chains slow

I thought this worked pretty well: [1905.11916] Selecting the Metric in Hamiltonian Monte Carlo – that’s a heuristic of given two metrics guess which will do better. Sorta different from detecting things went awry – more a guess at what might break in the future.

Cool. It’s something we want to improve.

The workflow paper: http://www.stat.columbia.edu/~gelman/research/unpublished/Bayesian_Workflow_article.pdf probably has more of the inspiration – some combination of wanting to run faster + fail faster

There’s a channel on the slack where we’re talking about benchmarking this stuff: Mc-stan community slack (basically uses of GitHub - stan-dev/posteriordb: Database with posteriors of interest for Bayesian inference)

mike-lawrence · November 11, 2020, 5:07pm

Oh, my bad; I used the word “metric”, which I know has a technical meaning, but I actually simply meant the values in the accept_stat__ and stepsize__ columns. I’m nowhere near knowledgeable to try to tackle assessing the actual HMC metric proper (nor really understand what it even is!). I’m just looking to add some monitoring of the csv content during warmup (I already have divergence and treedepth watching, and presumably rhat/ess stuff shouldn’t be computed until sampling begins).

bbbales2 · November 11, 2020, 7:19pm

Hmm, have you seen this: Issue with dual averaging ?

Sometimes when I’m running models I would like to see the lp__ in each chain, for instance. Then I could vaguely see if everything ended up in the same place.

This still goes back to the workflow paper, failing fast and all. If that chains aren’t going to around the same lp within 100-150 draws, time to kill the chains and see what is going on.

There are a few things here:

Monitoring chains as they run
Async cmdstan interface so it is possible to monitor stuff as it runs
Doing analysis on killed chains – that might mean chains of different lengths at various stages of adaptation
Restarting chains from where you killed them

I don’t mean to weigh you down with giant projects but if you’re feeling keen to write experimental R packages then yeah, we’re curious how much all these things are worth. I’ve talked to @jtimonen about this some.

Topic		Replies	Views
Variable running speeds across chains during warmup - possible causes Modeling	4	646	May 27, 2019
Idea for additional convergence metric: Rhat for the warmup info? General	2	582	November 9, 2020
Improve warnings for low ESS Developers	31	4123	August 5, 2020
Inconsistent chain speed - does this give a clue about the problem? Algorithms optimization	10	4605	July 20, 2018
Evaluating parallelization performance Developers	23	1801	October 1, 2019

Any intuitions on evaluating accept_stat__ & stepsize__?

Related topics

Any intuitions on evaluating accept_stat & stepsize?