In the SBC package under development, I use the maximum Rhat value over all parameters in a fit as one of diagnostics whether the fit was OK overall. When exploring a set of all fits of the SBC runs I also look at the maxima of those maxima, as with 100+ fits one just cannot really look at single values.
In this setting it turns out that the default 1.01 threshold is quite strict and with a lot of variables and lots of fits, there will almost always be some Rhats that are a bit a larger (something like 1.015 or 1.023). My current understanding is that this is to be expected even for well-behaving models: Rhat is itself stochastic, so some proportion of slightly high Rhats will be seen.
So the question is:
- Is there a better way to summarise multiple Rhat values to get a good diagnostic than just taking the max?
- If taking the maximum is sensible, should I adjust the 1.01 threshold? And how? My current hacky idea is to assume that Rhats in a well behaving model are approximately i.i.d. N(1, 0.005) distributed and then use the extreme value distribution to estimate 0.99 quantile of the distribution of the maxima and use this as a threshold. This gives me the following thresholds:
n_vars rhat_thresh
1 1 1.011632
2 2 1.013574
3 3 1.014711
4 4 1.015517
5 5 1.016142
6 6 1.016653
7 7 1.017085
8 8 1.017460
9 9 1.017790
10 10 1.018085
11 100 1.019771
12 1000 1.022022
13 10000 1.024239
Does that make sense? Or maybe I should just avoid summarising and show histograms of rhats or similar? Or report the percentage of rhats that exceeds the 1.01 threshold and assume some low percentage is not a concern (and thus would not be reported as a warning)?
Thanks for any ideas.