Definition of Markov chain standard error (MCSE)

PToTheK · November 30, 2023, 3:30pm

Hi Stan community,

I am curious about the estimation of the asymptotic variance \sigma^2, appearing in the definition of the MCSE for the mean value of a parameter (see MCMC central limit theorem).

My question is:
Why is the estimate of this asymptotic variance defined as:

\sigma^2 \approx \frac{\hat{\sigma}^2_n}{N_{eff}},

within the Stan Reference manual - MCSE? Hence, the expression of \sigma^2 in MCMC central limit theorem is much more complex.

What is the justification for this estimate?

Notation:

\hat{\sigma}_n is the standard deviation of the combined posterior draws across chains
N_{eff} is the effective sampling size.

Remark:
@betanalpha has written about this in Markov Chain Monte Carlo in Practice (Section 4.2.2).
The starting point of his argumentation is the formula:

\lim_{N \rightarrow\infty} N \; \mathrm{Var}[\hat{f}^\mathrm{MCMC}_N] = \mathrm{Var}_\pi[f] \; \tau_l[f],

where \mathrm{Var}_\pi[f] can be approximated by \hat{\sigma}^2_n. But I do not understand where the term \mathrm{Var}_\pi[f] in the above formula comes from in the first place.

jsocolar · November 30, 2023, 4:51pm

To get the ball rolling, note that where you say

you appear to be overlooking the complexity of the computation of the effective sample size in the denominator of the expression from the Stan reference manual, as well as some deep similarities between the two versions. Perhaps @avehtari might chime in with a clear answer to some of the specifics of your question.

PToTheK · November 30, 2023, 7:34pm

Thank you for your answer @jsocolar. Yeah the estimation of the effective sample size is indeed rather complex (autocorrelation at lag t).

Let me explain a little bit more, so my question becomes clearer:

One approximates an expectation value of some real valued function g(X), via some MCMC samples, X_1,\dots, X_N:

\hat{\mu} = \frac{1}{N} \sum^N_{i=1}X_i.

One wants to estimate the variance of \hat{\mu}, i.e. the Markov chain standard error of the mean.
According to https://en.wikipedia.org/wiki/Bienaymé's_identity:

\sigma^2 = \mathrm{Var}\left(\sum^N_{i=1} X_i \right) = \sum^N_{i,j=1} \mathrm{Cov}\left(X_i, X_j\right)

How does one derive from this last formula the relation \sigma^2 \approx \frac{\hat{\sigma}^2_n}{N_{eff}}?

Bob_Carpenter · November 30, 2023, 10:29pm

Paralleling the ordinary central limit theorem, which assumes independent draws, in the MCMC central limit theorem, the standard error for a parameter \theta can be defined to be:

\displaystyle \textrm{se} = \frac{\textrm{sd}}{\sqrt{N_{\textrm{eff}}}}

where \textrm{se} is the standard error of \theta and \textrm{sd} is the standard deviation (in the posterior) of \theta. The autocovariance you’re looking at is used in the definition of N_{\textrm{eff}}. See, for example, page 287 of Bayesian Data Analysis (free download) for the derivation. For a comparison of different estimators of effective sample size and more discussion, see @avehtari’s Comparison of MCMC effective sample size estimators, which points to our paper (mostly Aki’s!) on revised estimators for it.

PToTheK · December 1, 2023, 8:08am

Thank you for your answer @Bob_Carpenter.

The point of my question is how do you get to the formula you quoted:

\mathrm{se} = \frac{\mathrm{sd}}{\sqrt{N_{eff}}}

For example, Geyer (2005) suggest to use batch means to estimate what you called ‘\mathrm{se}’.
And I do not see how this approach and the above formula are connected.

Please could you bring light into the darkness?

caesoma · April 17, 2024, 9:02am

Can’t you recover the expression from the definition of Standard Error, the N would remains, and you would have to replace the variance of the sum of independent samples Var(T) with the covariance of correlated samples, i.e. covariance at all lags. That should lead to \sum_{t=-\infty}^{\infty} \rho_t, and divided by N would lead to N_{eff}.

Sorry I cannot make the whole derivaiton myself right now, but does that not work?

PToTheK · May 28, 2024, 3:04pm

Hi @caesoma thanks for the reply,
I had figured it out and forgot about this open question.

The steps to derive at the formula are:

Use Bienaymé’s identity for the variance of the mean
Transform it to get something like: Var(\hat{\mu}) = Var(S)\tau_t/N
Identify the term for the autocorrelation at lag t , i.e. \tau_t
Take the limit N \rightarrow \infty and assuming that the sum over the correlations is ‘absolutely summable’ (see Cesaro sum), one gets \tau_t \approx 1+2 \sum^{\infty}_{t=1} Corr(X_i, X_{i+t})

Definitions:

\hat{\mu}: estimate of mean value
S: sample of X_i, ..., X_N values
N: number of sampling points

Bob_Carpenter · May 28, 2024, 7:43pm

Sorry for not following up earlier.

Standard error is just defined to be the standard deviation of the estimator. We know from the central limit theorem that for i.i.d. samples and the sample mean estimator, we have

\textrm{se} = \frac{\textrm{sd}}{\sqrt{N}}.

The formula involving effective sample size is just saying what it means to have an effective sample size of a given value—it’s what you plug in for N in the Markov chain CLT.

The trick’s in estimating N^{\textrm{eff}}. There are lots of ways to do that. We’re using an initial monotone sequence estimator in some of our interfaces and then a revised one based on ranks in the newer interfaces.

One thing you can do is run a whole lot of estimates and then calculate the standard errors empirically by taking the standard deviation of the estimates. For example, you can run multiple Markov chains and estimate error in each one and the standard deviation there gives you standard error.

Topic		Replies	Views
Calculation of SEof mean in STAN General	2	539	December 15, 2018
Uncertainty around central tendency - Bayes SE vs. Bayesin CI General	10	1656	June 15, 2017
Unit testing sampling General	6	1189	August 19, 2017
Se_mean in print(stanfit) Modeling	3	2627	December 28, 2020
Writing a Program to Calculate ESS from Samples General interpret-results	3	416	December 18, 2023

Definition of Markov chain standard error (MCSE)

Related topics