Understanding Rhat with respect to the results from Bayesian Hierarchical modeling

Hi everyone,

I know That close to 1 and 1.1 is good and means the chains are mixing well and they converge.

Does that mean if we use no of chains =1, Rhat values doesn’t matter?

Can someone explain how to interpret Rhat values and what it actually means?

Thanks a lot in advance,

Hi, I think that \widehat{R} looks at both the within and between chain variance. Hence, setting chains = 1 would probably make \widehat{R} results questionable at best (btw, it should be \widehat{R}<1.01.

1 Like

The definition we use for R-hat is in the Stan reference manual. It splits all chains in half before applying the “old” definition of R-hat. How good an R-hat value you need will depend on how good you need the results to be. We typically start with shorter chains while developing models then run longer until they stop griping before publishing. What you’re really looking for is a high enough effective sample size, as that takes R-hat-like cross-chain info into account (definition also in the manual). But the effective sample size estimates are unreliable when they’re small. You can’t trust an effective sample size of 10 to truly be 10. It needs to be in the neighborhood of 100 per chain before it becomes reliable.

2 Likes