Summarising Rhat values over multiple variables/fits

martinmodrak · August 16, 2021, 8:35am

Update: the problem seems to be almost exclusively the stochasticity of Rhat if the number of post-warmup samples is low.

Taking a very simple model:

data {
   int<lower=0> N;
   real y[N];
}

parameters {
   real mu;
}

model {
   mu ~ normal(0, 2);
   y ~ normal(mu, 1);
}

I simulate data exactly from this model.

Running 1000 fits with 1000 warmup iterations but 200 sampling iterations: 499 fits had Rhat > 1.01. Largest Rhat was 1.042.

Running 1000 fits with 200 warmup iterations but 1000 sampling iterations: All rhats < 1.01

So it seems that the model is able adapt and warmup quickly, so all the chains actually explore the same distribution in both cases, just that the Rhat estimate has larger variance with fewer samples (which sounds unsurprising).

I did check that 200 i.i.d. samples have unproblematic Rhat while autocorrelated samples can have problems, so it probably is the combination of autocorrelation + low number of samples that is tripping Rhat to give false positives.

Code for simulation

I.i.d. samples:

n_iter <- 200
for(i in 1:10) {
   var <- posterior::rvar(array(rnorm((n_iter)  * 4), dim = c(n_iter, 4)), with_chains = TRUE, nchains = 4)
   print(posterior::rhat(var))
}

A representative result:

[1] 1.000692
[1] 1.001277
[1] 0.9993861
[1] 0.9969532
[1] 1.000892
[1] 1.001532
[1] 1.006309
[1] 1.001748
[1] 0.9996341
[1] 1.004682

Autocorrelated samples (a sliding-window linear combination of i.i.d normals - hope that’s not particularly stupid):

n_iter_high <- 1000
n_iter_low <- 200
lag_coeffs <- rev(c(1, 0.8,0.5))

N_sims <- 10
res <- array(NA_real_, dim = c(N_sims, 2), dimnames = list(NULL, paste0(c(n_iter_low, n_iter_high), "_iter")))
n_lags <- length(lag_coeffs)
for(i in 1:N_sims) {
   draws_latent <- array(rnorm((n_iter_high + n_lags) * 4), dim = c(n_iter_high + n_lags, 4))
   draws_observed <- array(NA_real_, dim = c(n_iter_high, 4))

   for(n in 1:n_iter_high) {
      for(c in 1:4) {
         draws_observed[n, c] <- sum(draws_latent[n:(n + n_lags - 1), c] * lag_coeffs)
      }
   }
   var_high <- posterior::rvar(draws_observed, with_chains = TRUE, nchains = 4)
   var_low<- posterior::rvar(draws_observed[(n_iter_high - n_iter_low):n_iter_high, ], with_chains = TRUE, nchains = 4)
   res[i, 1] <- posterior::rhat(var_low)
   res[i, 2] <- posterior::rhat(var_high)
}
res

Gives something like:

      200_iter 1000_iter
 [1,] 1.005455  1.001189
 [2,] 1.002999  1.000354
 [3,] 1.020892  1.003734
 [4,] 1.009019  1.003877
 [5,] 1.011578  1.002240
 [6,] 1.004152  1.001302
 [7,] 1.006214  1.001396
 [8,] 1.006232  1.001696
 [9,] 1.010586  1.002576
[10,] 1.010942  1.000831

I.e. the last 200 iterations from the same array have substantially more high Rhats than the whole thing.

I forgot about this - yes, that could help partially. But given that I have many fits with an R* value for each, I still need to determine whether to raise a warning overall. Do you have an idea how one could pick a sensible threshold. Also since R* is even more stochastic than Rhat, I would expect to see similar problems with short chains (but I didn’t check yet).

Topic		Replies	Views
Rhat values from a single chain vs multiple chains Modeling	2	779	December 30, 2022
False warning about max Rhat? - rstan General	3	3011	August 11, 2019
Traceplot vs Rhat for convergence General	11	2336	April 3, 2024
Rhat < 1 (as low as 9.94e-01). Why? General	6	3938	June 19, 2019
How rstan compute Rhat? General rstan	2	159	December 31, 2024

Summarising Rhat values over multiple variables/fits

Related topics