New \widehat{R} and ESS
There’s a new paper discussing improvement for \widehat{R} and effective sample size diagnostics for MCMC
- Aki Vehtari, Andrew Gelman, Daniel Simpson, Bob Carpenter, Paul-Christian Bürkner (2019): Rank-normalization, folding, and localization: An improved \widehat{R} for assessing convergence of MCMC. arXiv preprint arXiv:1903.08008.
You may also read @anon75146577’s blog post
I think the most important points are
- The new value of R-hat is robust against heavy tails and is sensitive to changes in scale between the chains.
- The new value of R-hat is parameterization invariant, which is to say that the R-hat value for \theta and \log(\theta) will be the same. This was not a property of the original formulation.
- New diagnostic effective sample size (ESS) which has the same properties as the new R-hat
- Recommendation to look at diagnostic ESS seprately for bulk and tails
- Separating diagnostic ESS, and ESS needed to compute MCSE for a quantity of interest (now assuming that quantity has finite variance)
- MCSE estimates for Median, MAD and quantiles
- Many nice plots for further diagnostics
When considering using these with Stan there are two issues (plots are easy as they go to bayesplot and arviz packages)
- what is shown in the default output
- where the computation happens
Proposal for the default output
The current default output looks like this:
Inference for Stan model: eight_schools_cp.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu 4.50 0.12 3.26 -1.96 2.28 4.53 6.66 10.80 695 1.01
tau 3.86 0.16 3.12 0.55 1.66 3.02 5.22 11.87 385 1.01
theta[1] 6.41 0.17 5.52 -2.88 2.93 5.88 9.19 19.09 1043 1.00
theta[2] 5.05 0.13 4.73 -4.17 2.26 4.99 7.74 14.96 1316 1.00
theta[3] 4.06 0.14 5.36 -7.77 1.09 4.32 7.23 14.52 1418 1.00
theta[4] 4.89 0.14 4.77 -4.77 1.94 4.88 7.76 14.66 1118 1.00
theta[5] 3.75 0.15 4.59 -6.10 0.96 3.95 6.71 12.33 1002 1.00
theta[6] 4.16 0.13 4.71 -6.42 1.51 4.38 7.00 13.00 1280 1.01
theta[7] 6.44 0.16 4.99 -2.13 3.19 6.04 9.24 18.37 947 1.00
theta[8] 4.99 0.14 5.20 -5.41 1.93 4.86 7.85 16.44 1357 1.00
lp__ -15.03 0.43 6.07 -26.35 -19.40 -15.31 -10.74 -2.91 201 1.02
Samples were drawn using NUTS(diag_e) at Thu Mar 21 16:51:05 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
A new proposal is (there are more examples in the online appendix)
Inference for Stan model: eight_schools_cp.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
Q5 Q50 Q95 Mean SD Rhat Bulk_ESS Tail_ESS
mu -0.93 4.53 9.85 4.50 3.26 1.01 695 1124
tau 0.71 3.02 9.79 3.86 3.12 1.02 218 229
theta[1] -1.24 5.88 16.01 6.41 5.52 1.01 991 1590
theta[2] -2.44 4.99 12.84 5.05 4.73 1.00 1247 2033
theta[3] -4.99 4.32 12.13 4.06 5.36 1.00 1168 1787
theta[4] -2.91 4.88 12.73 4.89 4.77 1.00 1024 1905
theta[5] -3.99 3.95 10.80 3.75 4.59 1.00 951 2013
theta[6] -3.97 4.38 11.45 4.16 4.71 1.01 1196 1701
theta[7] -0.91 6.04 15.06 6.44 4.99 1.00 936 1583
theta[8] -3.08 4.86 13.78 4.99 5.20 1.00 1287 1492
lp__ -24.50 -15.31 -4.61 -15.03 6.07 1.02 210 240
For each parameter, Bulk_ESS and Tail_ESS are crude measures of
effective sample size for bulk and tail quantities respectively (good values is
ESS > 400), and Rhat is the potential scale reduction factor on rank normalized
split chains (at convergence, Rhat = 1).
In addition, although not shown by default method the returned object would include values for
Q5_MCSE, Q50_MCSE, Q95_MCSE, Mean_MCSE, SD_MCSE, MAD, and MAD_MCSE.
Comments on the default output or what is computed?
What is the process to decide how and when the default output changes?
Where the computation happens
Currently CmdStan, RStan and PyStan compute the quantities shown in the default output in C++, but they use three separate C++ files for that so it’s not happening in just one place. To make the default output same the computation needs to be implemented in C++ at least for CmdStan. The same C++ code can be copied to C++ files used by RStan and PyStan. I would keep it as a separate discussion how to make the computation to happen only in one place, as there seems to performance issues in which data structure types are used in Stan, RStan and PyStan, and at least I don’t know how to fix this.
Anyway, I can add computations to C++ in the current way or after someone has made refactoring of the code. Although we can use one computation code for different Stan interfaces, it’s likely that the there will be R and Python code as well as bayesplot and arviz would have new plots needing these computations and both of these packages are generic beyond Stan and probably these packages don’t want include dependency on Stan C++.
Comments on where the computation happens?