Definition of Markov chain standard error (MCSE)

Hi Stan community,

I am curious about the estimation of the asymptotic variance \sigma^2, appearing in the definition of the MCSE for the mean value of a parameter (see MCMC central limit theorem).

My question is:
Why is the estimate of this asymptotic variance defined as:

\sigma^2 \approx \frac{\hat{\sigma}^2_n}{N_{eff}},

within the Stan Reference manual - MCSE? Hence, the expression of \sigma^2 in MCMC central limit theorem is much more complex.

What is the justification for this estimate?

Notation:

  • \hat{\sigma}_n is the standard deviation of the combined posterior draws across chains
  • N_{eff} is the effective sampling size.

Remark:
@betanalpha has written about this in Markov Chain Monte Carlo in Practice (Section 4.2.2).
The starting point of his argumentation is the formula:

\lim_{N \rightarrow\infty} N \; \mathrm{Var}[\hat{f}^\mathrm{MCMC}_N] = \mathrm{Var}_\pi[f] \; \tau_l[f],

where \mathrm{Var}_\pi[f] can be approximated by \hat{\sigma}^2_n. But I do not understand where the term \mathrm{Var}_\pi[f] in the above formula comes from in the first place.

To get the ball rolling, note that where you say

you appear to be overlooking the complexity of the computation of the effective sample size in the denominator of the expression from the Stan reference manual, as well as some deep similarities between the two versions. Perhaps @avehtari might chime in with a clear answer to some of the specifics of your question.

Thank you for your answer @jsocolar. Yeah the estimation of the effective sample size is indeed rather complex (autocorrelation at lag t).

Let me explain a little bit more, so my question becomes clearer:

  1. One approximates an expectation value of some real valued function g(X), via some MCMC samples, X_1,\dots, X_N:
\hat{\mu} = \frac{1}{N} \sum^N_{i=1}X_i.
  1. One wants to estimate the variance of \hat{\mu}, i.e. the Markov chain standard error of the mean.
  2. According to https://en.wikipedia.org/wiki/Bienaymé's_identity:
\sigma^2 = \mathrm{Var}\left(\sum^N_{i=1} X_i \right) = \sum^N_{i,j=1} \mathrm{Cov}\left(X_i, X_j\right)

How does one derive from this last formula the relation \sigma^2 \approx \frac{\hat{\sigma}^2_n}{N_{eff}}?

Paralleling the ordinary central limit theorem, which assumes independent draws, in the MCMC central limit theorem, the standard error for a parameter \theta can be defined to be:

\displaystyle \textrm{se} = \frac{\textrm{sd}}{\sqrt{N_{\textrm{eff}}}}

where \textrm{se} is the standard error of \theta and \textrm{sd} is the standard deviation (in the posterior) of \theta. The autocovariance you’re looking at is used in the definition of N_{\textrm{eff}}. See, for example, page 287 of Bayesian Data Analysis (free download) for the derivation. For a comparison of different estimators of effective sample size and more discussion, see @avehtari’s Comparison of MCMC effective sample size estimators, which points to our paper (mostly Aki’s!) on revised estimators for it.

Thank you for your answer @Bob_Carpenter.

The point of my question is how do you get to the formula you quoted:

\mathrm{se} = \frac{\mathrm{sd}}{\sqrt{N_{eff}}}

For example, Geyer (2005) suggest to use batch means to estimate what you called ‘\mathrm{se}’.
And I do not see how this approach and the above formula are connected.

Please could you bring light into the darkness?

Can’t you recover the expression from the definition of Standard Error, the N would remains, and you would have to replace the variance of the sum of independent samples Var(T) with the covariance of correlated samples, i.e. covariance at all lags. That should lead to \sum_{t=-\infty}^{\infty} \rho_t, and divided by N would lead to N_{eff}.

Sorry I cannot make the whole derivaiton myself right now, but does that not work?