I am curious about the estimation of the asymptotic variance \sigma^2, appearing in the definition of the MCSE for the mean value of a parameter (see MCMC central limit theorem).
My question is:
Why is the estimate of this asymptotic variance defined as:
\lim_{N \rightarrow\infty} N \; \mathrm{Var}[\hat{f}^\mathrm{MCMC}_N] = \mathrm{Var}_\pi[f] \; \tau_l[f],
where \mathrm{Var}_\pi[f] can be approximated by \hat{\sigma}^2_n. But I do not understand where the term \mathrm{Var}_\pi[f] in the above formula comes from in the first place.
you appear to be overlooking the complexity of the computation of the effective sample size in the denominator of the expression from the Stan reference manual, as well as some deep similarities between the two versions. Perhaps @avehtari might chime in with a clear answer to some of the specifics of your question.
Paralleling the ordinary central limit theorem, which assumes independent draws, in the MCMC central limit theorem, the standard error for a parameter \theta can be defined to be:
where \textrm{se} is the standard error of \theta and \textrm{sd} is the standard deviation (in the posterior) of \theta. The autocovariance you’re looking at is used in the definition of N_{\textrm{eff}}. See, for example, page 287 of Bayesian Data Analysis (free download) for the derivation. For a comparison of different estimators of effective sample size and more discussion, see @avehtari’s Comparison of MCMC effective sample size estimators, which points to our paper (mostly Aki’s!) on revised estimators for it.
The point of my question is how do you get to the formula you quoted:
\mathrm{se} = \frac{\mathrm{sd}}{\sqrt{N_{eff}}}
For example, Geyer (2005) suggest to use batch means to estimate what you called ‘\mathrm{se}’.
And I do not see how this approach and the above formula are connected.
Can’t you recover the expression from the definition of Standard Error, the N would remains, and you would have to replace the variance of the sum of independent samples Var(T) with the covariance of correlated samples, i.e. covariance at all lags. That should lead to \sum_{t=-\infty}^{\infty} \rho_t, and divided by N would lead to N_{eff}.
Sorry I cannot make the whole derivaiton myself right now, but does that not work?
Hi @caesoma thanks for the reply,
I had figured it out and forgot about this open question.
The steps to derive at the formula are:
Use Bienaymé’s identity for the variance of the mean
Transform it to get something like: Var(\hat{\mu}) = Var(S)\tau_t/N
Identify the term for the autocorrelation at lag t , i.e. \tau_t
Take the limit N \rightarrow \infty and assuming that the sum over the correlations is ‘absolutely summable’ (see Cesaro sum), one gets \tau_t \approx 1+2 \sum^{\infty}_{t=1} Corr(X_i, X_{i+t})
Standard error is just defined to be the standard deviation of the estimator. We know from the central limit theorem that for i.i.d. samples and the sample mean estimator, we have
\textrm{se} = \frac{\textrm{sd}}{\sqrt{N}}.
The formula involving effective sample size is just saying what it means to have an effective sample size of a given value—it’s what you plug in for N in the Markov chain CLT.
The trick’s in estimating N^{\textrm{eff}}. There are lots of ways to do that. We’re using an initial monotone sequence estimator in some of our interfaces and then a revised one based on ranks in the newer interfaces.
One thing you can do is run a whole lot of estimates and then calculate the standard errors empirically by taking the standard deviation of the estimates. For example, you can run multiple Markov chains and estimate error in each one and the standard deviation there gives you standard error.