R*: A robust MCMC convergence diagnostic with uncertainty using gradient-boosted machines

avehtari · September 2, 2020, 1:00pm

A new convergence diagnostic with Ben Lambert “R*: A robust MCMC convergence diagnostic with uncertainty using gradient-boosted machines” https://arxiv.org/abs/2003.07900. The idea is to use machine learning classifier for multivariate convergence diagnostics (e.g. Rhat is usually for univariate)

Machine learning classifier (here gradient-boosted regression trees) is used to try to classify which draws come from which chain. If chains are mixing well, it’s not possible to beat random guessing. In case of bad mixing it’s possible to separate draws from different chains.

Ben had the idea and wrote the first version and then contacted me for comments. I was very skeptical as machine learning classifiers can be sensitive to algorithm parameters. I proposed additional experiments and after several iterations and more experiments I was convinced.

The benefit is that this can be done once for all parameters, and can detect mixing problems which may not appear in marginals. The classifier is non-parametric and doesn’t assume finite variance (like old Rhat). We believe it’s going to be useful complementary approach.

The code for the method and the experiments is provided at https://github.com/ben18785/ml-mcmc-convergence

We hope to get feedback if you try this out

mike-lawrence · September 2, 2020, 3:35pm

Such a simple-yet-powerful idea!

roualdes · November 9, 2020, 8:58pm

@avehtari, would you say something a bit more about why you thinned post-warm-up iterations? Sometimes there was no thinning, sometimes thinning by a factor of 3, and other times by a factor of 5. It seems that in general this community suggests thinning is not necessary when using Stan.

Was it an issue with autocorrelation? Or more to do with the computational complexity of boosted regression trees, maybe computational time, memory constraints, or both? All of the above?

Thanks.

avehtari · November 10, 2020, 8:50am

Thinning reduces information, so if thinning is not needed then thinning is not recommended
In many cases the dynamic HMC in Stan is so efficient that the default number of iterations provides sufficient accuracy and there is no memory issues
In some cases dynamic HMC in Stan can also have so high autocorrelation and if there are many parameters, it may be beneficial to thin to save disk space, memory and computation time for derived quantities.
In some cases we really need almost independent draws (e.g. SBC), and then we need to thin also antithetic chains (which have better efficiency than independent draws for certain expectations)

So the generally seend recommendation to not thin holds often but is not the recommendation for every case.

Yes.

roualdes · November 10, 2020, 5:29pm

Thanks, @avehtari. If you will, a few follow up questions. If you had infinite computing resources, would you have thinned? For a fixed model did R*'s answer to the question, have the chains converged?, vary depending on the amount of thinning? How did you choose the factor x to thin by?

avehtari · November 12, 2020, 9:09am

With infinite computing resources we don’t need convergence diagnostics. We can choose trivial slow algorithms that in infinite time produce the exact expectation.

R* is not sensitive to thinning that is it’s not sensitive to the autocorrelation. Ben did test the effect of the autocorrelation.

Ask details from Ben, but the choice is mostly arbitrary and for computational convenience. It would be different if approximately independent draws would be needed, but that is not case for R*.

jonah · November 12, 2020, 6:52pm

There’s also now a preliminary implementation in the posterior package if you want to play around with it:

roualdes · November 12, 2020, 7:42pm

Thanks to you both. That looks great.

Topic		Replies	Views
Quantitative diagnostics for assessing the convergence of MCMC samples General	5	693	June 15, 2021
Ensuring convergence without any graphical diagnostics Modeling pharmacology , diagnostics , torsten	1	95	February 27, 2025
Split-Rhat diagnostic and relative effective sample size Algorithms	8	1866	July 16, 2019
Bad chain diagnostic, but good data recovery General	12	1025	December 14, 2020
Stop stan when it reaches convergence (Rhat = 1) RStan	1	449	May 31, 2021

R*: A robust MCMC convergence diagnostic with uncertainty using gradient-boosted machines

Related topics