Rhat < 1 (as low as 9.94e-01). Why?

I keep getting Rhat below 1 (for usually about half of my parameters of interest at the level of 9.99e-01). A recent run (see below) reported Rhat < 1 for even smaller values (eg, 9.94e-01, 9.96e-01, 9.97e-01).

Anyone knows why Rhat < 1 is reported? (@bbbales2 kindly told me that Rhat in theory should be at least 1.)

Thanks for your sharing.

4 chains: each with iter=(150,150,150,150); warmup=(0,0,0,0); thin=(1,1,1,1); 600 iterations saved.
Warmup took (10096, 10544, 10196, 9615) seconds, 11.2 hours total
Sampling took (2052, 1257, 1786, 1156) seconds, 1.74 hours total
                        Mean      MCSE    StdDev         5%        50%        95%     N_Eff   N_Eff/s     R_hat
lp__                1.85e+04  8.37e-01  1.28e+01   1.85e+04   1.85e+04   1.86e+04  2.35e+02  3.75e-02  1.00e+00
accept_stat__       9.85e-01  1.26e-03  2.98e-02   9.36e-01   9.95e-01   1.00e+00  5.56e+02  8.90e-02  1.01e+00
stepsize__          1.25e-02  1.04e-03  1.48e-03   1.03e-02   1.38e-02   1.39e-02  2.01e+00  3.21e-04  7.39e+14
treedepth__         9.35e+00  2.61e-01  4.78e-01   9.00e+00   9.00e+00   1.00e+01  3.35e+00  5.37e-04  1.58e+00
n_leapfrog__        7.44e+02  1.33e+02  2.55e+02   5.11e+02   5.11e+02   1.02e+03  3.67e+00  5.87e-04  1.49e+00
divergent__         0.00e+00      -nan  0.00e+00   0.00e+00   0.00e+00   0.00e+00      -nan      -nan      -nan
energy__           -1.84e+04  1.30e+00  1.87e+01  -1.84e+04  -1.84e+04  -1.83e+04  2.06e+02  3.30e-02  1.00e+00
sd_y                8.09e-02  1.97e-05  5.76e-04   8.00e-02   8.09e-02   8.19e-02  8.51e+02  1.36e-01  9.96e-01
mu_u1               1.09e-01  3.28e-04  9.21e-03   9.30e-02   1.08e-01   1.24e-01  7.88e+02  1.26e-01  9.97e-01
mu_alpha            4.07e-02  9.24e-05  2.03e-03   3.74e-02   4.09e-02   4.41e-02  4.85e+02  7.76e-02  1.00e+00
beta                5.98e-01  2.77e-04  7.70e-03   5.85e-01   5.98e-01   6.12e-01  7.73e+02  1.24e-01  9.94e-01
theta               1.52e-01  1.25e-04  3.33e-03   1.46e-01   1.52e-01   1.57e-01  7.13e+02  1.14e-01  9.96e-01
sd_season           1.03e-01  1.46e-04  4.46e-03   9.55e-02   1.02e-01   1.10e-01  9.34e+02  1.49e-01  9.99e-01
mu_season[1]       -1.29e-01  3.78e-04  1.08e-02  -1.46e-01  -1.29e-01  -1.11e-01  8.13e+02  1.30e-01  9.98e-01
mu_season[2]       -7.07e-02  4.16e-04  1.17e-02  -9.08e-02  -7.10e-02  -5.17e-02  7.85e+02  1.26e-01  9.96e-01
mu_season[3]        1.57e-01  4.38e-04  1.14e-02   1.38e-01   1.57e-01   1.76e-01  6.73e+02  1.08e-01  1.00e+00
p[1]                7.33e-01  2.27e-03  5.98e-02   6.53e-01   7.21e-01   8.45e-01  6.95e+02  1.11e-01  9.99e-01
p[2]                6.15e-01  2.14e-04  5.22e-03   6.07e-01   6.15e-01   6.23e-01  5.94e+02  9.51e-02  1.00e+00
p[3]                7.27e-01  3.25e-03  8.55e-02   6.14e-01   7.10e-01   8.87e-01  6.93e+02  1.11e-01  9.99e-01
g[1]                7.63e-01  1.53e-03  4.67e-02   6.89e-01   7.61e-01   8.40e-01  9.26e+02  1.48e-01  1.00e+00
g[2]                3.46e-01  8.71e-04  2.18e-02   3.13e-01   3.46e-01   3.82e-01  6.26e+02  1.00e-01  1.00e+00
w[1]                6.19e-01  2.94e-03  6.99e-02   5.01e-01   6.16e-01   7.37e-01  5.63e+02  9.00e-02  9.99e-01
w[2]                1.77e-01  5.86e-04  1.37e-02   1.53e-01   1.77e-01   1.99e-01  5.47e+02  8.75e-02  1.00e+00
w[3]                6.63e-01  7.81e-04  2.27e-02   6.25e-01   6.63e-01   7.03e-01  8.45e+02  1.35e-01  9.96e-01
d[1]                5.88e-02  4.62e-04  1.19e-02   4.04e-02   5.86e-02   7.82e-02  6.58e+02  1.05e-01  9.96e-01
d[2]                6.91e-01  3.78e-04  9.55e-03   6.75e-01   6.92e-01   7.07e-01  6.39e+02  1.02e-01  1.00e+00
d[3]                2.50e-01  4.23e-04  1.14e-02   2.31e-01   2.50e-01   2.67e-01  7.20e+02  1.15e-01  1.00e+00

I think that you must have misunderstood him. Rhat should be close to 1, but it doesn’t matter if it’s slightly more or slightly less than 1.

Theoretically, it should be bigger than 1, According to https://arxiv.org/pdf/1903.08008.pdf the formula is \hat{R}^2 = \frac{N-1}{N} + \frac{1}{N}\frac{B}{W}, where B and W are both non-negative. So if you get something less than one it indicates some numerical instability in the computation. But in this case I wouldn’t be hugely worried: it’s giviving values of around 0.99, so I suspect it’s just boring old rounding error. The effective sample sizes all look fine. If you didn’t get any divergences, I’d say everything is ok.

(@avehtari: should we threshold it from below at 1?)

1 Like

It’s not a problem, it’s not a rounding error, it’s just that as the B and W are estimated from finite number of draws this can be slightly less than 1 just by chance. It is more likely to observe this if you run very short chains as you are now doing with just 150 iterations. It is more likely to observe this also if you start the chains underdispersed and don’t have warmup or very short warmup.
We could threshold it so that values below 1 are reported as 1, but then we would miss warning examples like this where very shirt chains are used :)

It seems you have problems in warmup as it takes 5 times longer than the actual sampling, but that is another issue.

4 Likes

Thanks for your clear explanation. I think better to leave it as it is. I hope this post would be useful to others who also have this confusion. Perhaps, if an explanation has been documented in the manual in detail, would be great to leave a pointer here.

Yes, I’ve been struggling with this for some time now. Though not showing in the cmdstan report is that I ran warmup=800, which is about 5 times of the sampling (150). My problem is that if I don’t run so many or if I set delta below 0.99, I easily get divergences.

I use 90,000 simulated data points. If I don’t use so much, it is hard to get estimates reasonably close to the true parameter values given the model complexity. The run time here is kind of the lowest achievable by me as a novice model builder. I have already used metric=dense_e, thanks to @bbbales2’s suggestion. Otherwise, the run time is over twice longer and also the N_Eff is less than one-tenth here given the short chain of the sampling stage.

May I ask: Given the data size, is the run time still unreasonable long compared to complex models you have seen? Is there anything a novice user can still try to see if there’s a quick fix in the slow run time?

It’d be worth doing a save_warmup=1 run and plotting traceplots.

Plot a few parameters and see what’s happening. If it doesn’t look like (from a far off view) that the chains are getting to the same region of parameter space by 150-200 samples, then it might either be worth increasing the initial adaptation window (the init_buffer argument to cmdstan). Pay attention to the treedepths as well and see if there are differences in different sections of warmup. There’s a graphic of how things work in the Cmdstan manual (Figure 9.3 in the 2.18.1 manual).

1 Like

Thanks for your suggestions. I am giving a reply in my another post.