Rhat values from a single chain vs multiple chains

Hi everyone,

After running 4 chains on some model (1000 warmup iterations, 1000 sampling iterations), I ran summary diagnostics on each chain separately, as well as on all 4 chains.

I found that for some estimands, Rhat was large (>1.1) in a single chain, but satisfactory (<1.05) when calculated on all chains. I’m wondering what to make of it. Should I be happy with the results, since the variance in estimating Rhat is larger when using a single chain? Or should I be sad (!), since the definition of Rhat involves averaging across chains, thereby possibly washing out “bad” effects that are only apparent in a single chain?

I should also note that this is a model with many estimands of interest (many tens of thousands), so possibly I should actually use some correction to the Rhat threshold as suggested here or forget about Rhat completely and use MCSE.

Any help would be greatly appreciated!

Many thanks,



It is possible, but then it is also likely that the effect is washed out also in the estimation of the quantity of interest. If you are worried, run longer chains or more chains.

Or use multivariate R* Calculate R* convergence diagnostic — rstar • posterior for all jointly, and in addition Rhat and MCSE for the smaller number of quantities of interest. Running longer and more chains, makes it also more likely that you don’t need to worry about the multiple comparison issue.

Thank you for clarifying this! And thank you for suggesting R*. I will read about it and see how it behaves on our model.