Stan's ESS (and monotone autocorrelations)

Ben_Lambert · March 13, 2020, 2:01am

I am trying to code up Stan’s effective sample size estimator as defined in Sections 16.4 of the reference manual here and basically wondered whether I’d understood the recipe correctly. Does the order of things go like:

take only post warm-up samples
split chains into two (so M = 2 * number of chains run)
calculate W, var_plus, s_m_sq and rho_t,m
use the above to calculate rho_t
calculate the series P_t = rho_2t + rho_2t+1 for t = 1,…,m, where m is the largest integer such that P_t > 0 for all t <= m
create monotone version of P_t (I call it P_t_mon) by setting P_t_mon = P_t, if P_t-P_t-1<0; or P_t_mon = P_t-1_mon if otherwise (i.e. make it equal to the min of the current state of P_t_mon)
calculate tau = -1 + 2 * sum(P_t)
calculate ess = N * M / tau

Is this right? If I follow the above recipe with outputs from Stan I get values a bit different from the ones that are in the fit summary, so suspect I must be being dense somewhere (apologies in advance)…Ben

bgoodri · March 13, 2020, 2:12am

Not sure, but I am sure it is calculated like

github.com

stan-dev/rstan/blob/develop/rstan/rstan/R/monitor.R#L213


#'
#' @param sims A 2D array _without_ warmup samples (# iter * # chains).
#'
#' @return A single numeric value for the effective sample size.
#'
#' @references
#' Aki Vehtari, Andrew Gelman, Daniel Simpson, Bob Carpenter, and
#' Paul-Christian Bürkner (2019). Rank-normalization, folding, and
#' localization: An improved R-hat for assessing convergence of
#' MCMC. \emph{arXiv preprint} \code{arXiv:1903.08008}.
ess_rfun <- function(sims) {
  if (is.vector(sims)) {
    dim(sims) <- c(length(sims), 1)
  }
  chains <- ncol(sims)
  n_samples <- nrow(sims)
  if (n_samples < 3L || anyNA(sims)) {
    return(NA)
  }
  if (any(!is.finite(sims))) {
    return(NaN)

on split chains.

avehtari · March 13, 2020, 8:10am

See also [1903.08008] Rank-normalization, folding, and localization: An improved $\widehat{R}$ for assessing convergence of MCMC.

Which interface and which version? After I know that information, I can check that the reference manual matches the code (CmdStan, RStan, and PyStan, should have the same by default, while monitor in RStan and CmdStanR using posterior package use the rank normalized version).

Ben_Lambert · March 13, 2020, 11:57am

@avehtari and @bgoodri thank you. I am using pystan 2.19.1.1 (I don’t know how to check Stan’s version within pystan? I know rstan but, for a particular project am using pystan.)

In any case, it sounds like I should be using the rank normalised Rhat after reading the paper @avehtari suggests. I found your repo here: https://github.com/avehtari/rhat_ess/blob/master/code/monitornew.R Is this the current ground truth that I should be working off for these functions?

ahartikainen · March 13, 2020, 12:09pm

PyStan version follows Stan version.

import pystan
print(pystan.__version__)

There is rank normalized rhat in ArviZ package.

import arviz as az
summary = az.summary(fit) # pandas DataFrame
print(summary)

edit. https://github.com/arviz-devs/arviz/blob/master/arviz/stats/diagnostics.py

print(az.rhat(fit))

avehtari · March 13, 2020, 6:48pm

If you prefer Python, see @ahartikainen’s reply. Please report if those help, or if you think the documentation could be improved.

Topic		Replies	Views
Empty stanfit Object For Computing MCMC Summaries RStan	6	700	March 29, 2019
Writing a Program to Calculate ESS from Samples General interpret-results	3	415	December 18, 2023
Specify standard deviation of proposed candidate distributions from which the samples are drawn in Stan General	7	720	August 12, 2020
Effective Sample Size in SBC Modeling techniques , specification , simulation-based-calibration	5	789	February 12, 2021
Stan Algorithm API Algorithms	28	2205	December 8, 2019

Stan's ESS (and monotone autocorrelations)

Related topics