# Definition of Markov chain standard error (MCSE)

Hi Stan community,

I am curious about the estimation of the asymptotic variance \sigma^2, appearing in the definition of the MCSE for the mean value of a parameter (see MCMC central limit theorem).

My question is:
Why is the estimate of this asymptotic variance defined as:

\sigma^2 \approx \frac{\hat{\sigma}^2_n}{N_{eff}},

within the Stan Reference manual - MCSE? Hence, the expression of \sigma^2 in MCMC central limit theorem is much more complex.

What is the justification for this estimate?

Notation:

• \hat{\sigma}_n is the standard deviation of the combined posterior draws across chains
• N_{eff} is the effective sampling size.

Remark:
The starting point of his argumentation is the formula:

\lim_{N \rightarrow\infty} N \; \mathrm{Var}[\hat{f}^\mathrm{MCMC}_N] = \mathrm{Var}_\pi[f] \; \tau_l[f],

where \mathrm{Var}_\pi[f] can be approximated by \hat{\sigma}^2_n. But I do not understand where the term \mathrm{Var}_\pi[f] in the above formula comes from in the first place.

To get the ball rolling, note that where you say

you appear to be overlooking the complexity of the computation of the effective sample size in the denominator of the expression from the Stan reference manual, as well as some deep similarities between the two versions. Perhaps @avehtari might chime in with a clear answer to some of the specifics of your question.

Thank you for your answer @jsocolar. Yeah the estimation of the effective sample size is indeed rather complex (autocorrelation at lag t).

Let me explain a little bit more, so my question becomes clearer:

1. One approximates an expectation value of some real valued function g(X), via some MCMC samples, X_1,\dots, X_N:
\hat{\mu} = \frac{1}{N} \sum^N_{i=1}X_i.
1. One wants to estimate the variance of \hat{\mu}, i.e. the Markov chain standard error of the mean.
2. According to https://en.wikipedia.org/wiki/Bienaymé's_identity:
\sigma^2 = \mathrm{Var}\left(\sum^N_{i=1} X_i \right) = \sum^N_{i,j=1} \mathrm{Cov}\left(X_i, X_j\right)

How does one derive from this last formula the relation \sigma^2 \approx \frac{\hat{\sigma}^2_n}{N_{eff}}?

Paralleling the ordinary central limit theorem, which assumes independent draws, in the MCMC central limit theorem, the standard error for a parameter \theta can be defined to be:

\displaystyle \textrm{se} = \frac{\textrm{sd}}{\sqrt{N_{\textrm{eff}}}}

where \textrm{se} is the standard error of \theta and \textrm{sd} is the standard deviation (in the posterior) of \theta. The autocovariance you’re looking at is used in the definition of N_{\textrm{eff}}. See, for example, page 287 of Bayesian Data Analysis (free download) for the derivation. For a comparison of different estimators of effective sample size and more discussion, see @avehtari’s Comparison of MCMC effective sample size estimators, which points to our paper (mostly Aki’s!) on revised estimators for it.

The point of my question is how do you get to the formula you quoted:

\mathrm{se} = \frac{\mathrm{sd}}{\sqrt{N_{eff}}}

For example, Geyer (2005) suggest to use batch means to estimate what you called ‘\mathrm{se}’.
And I do not see how this approach and the above formula are connected.

Please could you bring light into the darkness?

Can’t you recover the expression from the definition of Standard Error, the N would remains, and you would have to replace the variance of the sum of independent samples Var(T) with the covariance of correlated samples, i.e. covariance at all lags. That should lead to \sum_{t=-\infty}^{\infty} \rho_t, and divided by N would lead to N_{eff}.

Sorry I cannot make the whole derivaiton myself right now, but does that not work?

Hi @caesoma thanks for the reply,

The steps to derive at the formula are:

1. Use Bienaymé’s identity for the variance of the mean
2. Transform it to get something like: Var(\hat{\mu}) = Var(S)\tau_t/N
3. Identify the term for the autocorrelation at lag t , i.e. \tau_t
4. Take the limit N \rightarrow \infty and assuming that the sum over the correlations is ‘absolutely summable’ (see Cesaro sum), one gets \tau_t \approx 1+2 \sum^{\infty}_{t=1} Corr(X_i, X_{i+t})

Definitions:

• \hat{\mu}: estimate of mean value
• S: sample of X_i, ..., X_N values
• N: number of sampling points

Sorry for not following up earlier.

Standard error is just defined to be the standard deviation of the estimator. We know from the central limit theorem that for i.i.d. samples and the sample mean estimator, we have

\textrm{se} = \frac{\textrm{sd}}{\sqrt{N}}.

The formula involving effective sample size is just saying what it means to have an effective sample size of a given value—it’s what you plug in for N in the Markov chain CLT.

The trick’s in estimating N^{\textrm{eff}}. There are lots of ways to do that. We’re using an initial monotone sequence estimator in some of our interfaces and then a revised one based on ranks in the newer interfaces.

One thing you can do is run a whole lot of estimates and then calculate the standard errors empirically by taking the standard deviation of the estimates. For example, you can run multiple Markov chains and estimate error in each one and the standard deviation there gives you standard error.