- In the end you should care about MCSE, but as a quick scale free diagnostic Rhat is useful, but any Rhat threshold not derived from MCSE is ad hoc
- 1.01 was chosen assuming one or a few Rhats are examined and the chains are run long enough to be able to infer autocorrelations well, too.
- In the new Rhat paper we didn’t explicitly discuss multiple comparisons, but what you write is the natural way to think about it.
- Multiple comparison correction as you describe is one way. When there are many variables, a more fancy approach would be to use a model to learn the variation.
- Looking at just the percentage exceeding is not enough as those exceeding might exceed a lot, so it’s better to assume some distribution for the Rhats and compare to that.
- For repeated automated testing a single binary decision can be useful, but in case of triggering the threshold, there should be more information available. I usually just eyeball the Rhats, but plotting a histogram of Rhats with a assumed distribution overlaid could be a useful way to look if the highest Rhats are suspiciously high.
- If in doubt, run more iterations
- If more iterations would be very costly, look at the other diagnostics such as ESS, MCSE, R*, etc.
3 Likes