Hi all,
There seems to be a variety of numbers floating around about the the recommended cutoff for rhat. I have been using 1.05 myself, but I have seen 1.01 and even 1.1. Any thoughts?
Also, I have read that the ratio of the effective sample size to the total sample size should not be less that .01, which seems awfully low. That just doesn’t seem right to me. Also any thoughts? Thanks
David
1 Like
Hi, so the cutoff is somewhat arbitrary and different people argued for different values (see e.g. Maybe it’s time to let the old ways die; or We broke R-hat so now we have to fix it. « Statistical Modeling, Causal Inference, and Social Science for some notes on this). In my experience most models fail in many ways at once, so the exact cutoff doesn’t really matter, but Vehtari et al. make a sensible argument for 1.01 and tighter is definitely better. If you believe you got one Rhat larger just due to some noise (and all other diagnostics are nice), you can run the sampling once more and see if it disappears.
The effective sample size cutoff is also completely arbitrary, but I think the rationale is that you can have sensible models that work OK-ish, but just can’t be easily brought to high ESS regime. We don’t necessarily want to flag those models as problematic, because it is OK to just run the models longer and thin the results. So say ESS/total sample size < 0.1 would not be very specific to only somehow “failing” models. This is unlike Rhat which is very specfic and you almost never get well-behaved models with Rhat > 1.01.
Cutoff for ESS/total at .01 also means that if you use default settings (4x 1000 post-warmup samples), you still get somewhat useful inference for the relevant quantity (e.g. mean or tail probabilities), but once again, I wouldn’t dwell on the specific number as everybody’s sensitivity vs. specificity tradeoffs are different.
Does that make sense?
Thanks. Makes perfect sense.