How rstan compute Rhat?

Hi all,
I want to implement Gelman-Rubin diagnostic to compute Rhat as a self-created function. The Gelman-Rubin diagnostic is referred to here(15.3 Notation for samples, chains, and draws | Stan Reference Manual). However, Rhat that rstan calculate is slightly different from my-created Rhat.

My calculation code is here. Rhat that I calculated is 0.9999659. However, Rhat that rstan calculated using summary function is 0.9999519.
Why are they different?

N = 8000 # the number of samples
M = 4 # the number of chain

# In this data.frame, MCMC samples are contained. nrow(df_rhat1) = N, ncol = M
df_rhat1 = data.frame(chain1 = chain_1[,i],chain2 = chain_2[,i],chain3 = chain_3[,i],chain4 = chain_4[,i])

theta_m = apply(df_rhat1, 2, mean) # mean within each chain
theta = mean(theta_m) # mean within all chain

B = N/(M-1)*sum((theta_m -theta)**2) #  The between-chain variance

sm2_pre = (df_rhat1-theta_m)**2 # parts of within-chain variance
sm2 = 1/(N-1)*apply(sm2_pre, 2,sum)

W = mean(sm2) # within-chain variance

var_plus = (N-1)/N*W + B/N # variance estimator
Rhat = sqrt(var_plus/W)
# > Rhat
# [1] 0.9999659

## Rhat that is computed in rstan summary
# round(summary(fit, pars = "delay")$summary, digits = 7) 
#            Rhat
# delay 0.9999519 # delay is a name of variable

delays.csv (570.9 KB)

I attached the data.frame that I used in the calculation.
And I confirmed this data.frame is definitely the same as the data.frame that rstan calculated.

Thank you for reading this question!
Please let me know if you want to ask me.

Thanks,
kei

Hi, welcome to the Stan forum. RStan is using the updated R-hat from this paper: [1903.08008] Rank-normalization, folding, and localization: An improved $\widehat{R}$ for assessing convergence of MCMC, which is also available in the posterior R package.

Here is the code used in the RStan package:

Inside that Rhat function there are several calls to the internal rhat_rfun function, which is in the same file and more similar to your version:

Here’s further clarification. There are several Rhat versions. As the diagnostic presented in Gelman and Rubin (1992) has changed substantially, the later versions should not be called Gelman-Rubin diagnostic. The versions I know are
v1 Gelman & Rubin, 1992
v2 Brooks & Gelman, 1998 (fix)
v3 BDA2, Gelman et al., 2003 (simplification)
v4 BDA3, Gelman et al., 2013 (split + connection to ESS)
v5 Vehtari et al., 2021 (rank normalization, folding, and much more details on connection to ESS)

@jonah linked to RStan code where Rhat() is using v5 and rhat_fun() is using v3. CmdStan and CmdStanR are displaying Rhat v5. posterior R package has both v4 rhat_basic() and v5 rhat() (and an internal function .rhat() is v3).

2 Likes