Hi all,
I want to implement Gelman-Rubin diagnostic to compute Rhat as a self-created function. The Gelman-Rubin diagnostic is referred to here(15.3 Notation for samples, chains, and draws | Stan Reference Manual). However, Rhat that rstan calculate is slightly different from my-created Rhat.
My calculation code is here. Rhat that I calculated is 0.9999659. However, Rhat that rstan calculated using summary function is 0.9999519.
Why are they different?
N = 8000 # the number of samples
M = 4 # the number of chain
# In this data.frame, MCMC samples are contained. nrow(df_rhat1) = N, ncol = M
df_rhat1 = data.frame(chain1 = chain_1[,i],chain2 = chain_2[,i],chain3 = chain_3[,i],chain4 = chain_4[,i])
theta_m = apply(df_rhat1, 2, mean) # mean within each chain
theta = mean(theta_m) # mean within all chain
B = N/(M-1)*sum((theta_m -theta)**2) # The between-chain variance
sm2_pre = (df_rhat1-theta_m)**2 # parts of within-chain variance
sm2 = 1/(N-1)*apply(sm2_pre, 2,sum)
W = mean(sm2) # within-chain variance
var_plus = (N-1)/N*W + B/N # variance estimator
Rhat = sqrt(var_plus/W)
# > Rhat
# [1] 0.9999659
## Rhat that is computed in rstan summary
# round(summary(fit, pars = "delay")$summary, digits = 7)
# Rhat
# delay 0.9999519 # delay is a name of variable
I attached the data.frame that I used in the calculation.
And I confirmed this data.frame is definitely the same as the data.frame that rstan calculated.
Thank you for reading this question!
Please let me know if you want to ask me.
Here’s further clarification. There are several Rhat versions. As the diagnostic presented in Gelman and Rubin (1992) has changed substantially, the later versions should not be called Gelman-Rubin diagnostic. The versions I know are
v1 Gelman & Rubin, 1992
v2 Brooks & Gelman, 1998 (fix)
v3 BDA2, Gelman et al., 2003 (simplification)
v4 BDA3, Gelman et al., 2013 (split + connection to ESS)
v5 Vehtari et al., 2021 (rank normalization, folding, and much more details on connection to ESS)
@jonah linked to RStan code where Rhat() is using v5 and rhat_fun() is using v3. CmdStan and CmdStanR are displaying Rhat v5. posterior R package has both v4 rhat_basic() and v5 rhat() (and an internal function .rhat() is v3).