I have the following traceplot. The 4 chains look converged to me.
But if I check the Rhat, I get:
mean se_mean sd 2.5% 97.5% n_eff Rhat
beta0[1] -2.24 0.02 0.07 -2.36 -2.11 13 1.14
beta0[2] -2.75 0.03 0.09 -2.91 -2.56 11 1.15
Is the model converged? Thanks.
1 Like
I know this isn’t the answer for your question but wouldn’t it be easier to decide with more than 400 iterations and 200 warm-ups? Maybe you can post Rhats and trace plots after a larger number of iterations?
Apart from that I do not know enough about the computation of Rhat values to decide how serious this is. With respect to assumption testing in a frequentist context I believe graphical/visual inspections are often better than cut-off values. I am however not sure if this should be generalized to Rhat values. In fact here (R: Convergence and efficiency diagnostics for Markov Chains) it says: “We recommend running at least four chains by default and only using the sample if R-hat is less than 1.05.” 1.15 really is far beyond this cut-off.
Traceplots most times look good when you zoom out enough.
How are the histograms for the samples (for different chains)? (Also, traceplot without warmup might work better).
1 Like
Replying to both:
After running longer, there is a slight improvement in Rhat.
mean se_mean sd 2.5% 97.5% n_eff Rhat
beta0[1] -2.23 0.01 0.07 -2.36 -2.10 69 1.06
beta0[2] -2.75 0.02 0.09 -2.93 -2.57 18 1.12
I also have the traceplot without-warmup
Finally histograms for the samples by chain
I guess I want to know when fellow researchers encounter this type of situation, how do you decide? I don’t know how “cut-and-dry” the Rhat<1.05 rule is. Your inputs are much appreciated.
Hi,
From visual inspection, I would say that, your chains are not mixing well. The within chain variance needs to be similar to the between chains variance, Rhat is a measure of how similar these variances are. In my experience an acceptable Rhat accompanies visually well mixed traces.
That said, I am frequently lost in how to “repair” a model in which chains don’t mix well.
Good luck
If I’m not wrong, there is also some autocorrelation in your draws. What are ess_bulk and ess_tail (and what is ndraws)
Thank you for everyone’s reply. I used a less-diffused prior and the chains mix well now.
1 Like
And this paper suggest even <1.01 rule! All these suggestions are ad-hoc starting points. Rhat and ESS (previously known as n_eff) are useful as they are scale free, that is, when checking them for many parameters you don’t need to compare them to the standard deviation of the marginal posterior of that parameter or to the domain knowledge.
If you don’t like arbitrariness of suggested Rhat and ESS thresholds, you can always in the end look at the Monte Carlo standard error (MCSE) for the quantities of interest and use the domain knowledge to assess whether the accuracy is sufficient. Stan and ArviZ computations use Rhat and ESS to compute MCSE, so you will get the benefit of the multi-chain diagnostic in MCSE, too, although MCSE estimates might be slightly overoptimistic (say half too small) if ESS (n_eff) is small.
Great!
EDIT: changed the paper link to point to doi.org
3 Likes
@avehtari, your link " And this paper …" is dead. Can you update this please?
thank you! And here is the DOI incase their site gets changed again. 10.1214/20-BA1221
The link I added is pointing to the doi https://doi.org/10.1214/20-BA1221 and not directly to the BA site
1 Like