On a higher level I can summarize the paper as follows:
- The effective sample size can be calculated with a simple formula from an estimated parametric model with an automatically selected model order using fast and robust algorithms.
- The error in the estimated effective sample size is close to the theoretical lower bound on estimation error (= Cramer Rao lower bound).
This paper compares this method to summing over the sample autocovariance, which is closer to the current algorithm in Stan:
Here’s the algorithm in Python, starting from the sample autocorrelation function acor_sample
, as is also used in the current n_eff computation. An AR model is selected using the order selection criterion GIC3 (similar to AIC, but with penalty factor 3):
from statsmodels.tsa.stattools import levinson_durbin, acovf
pmax = np.int(np.min([100,np.floor(niter/2)]))
_,_,pacf,res,phi = levinson_durbin(acor_sample,pmax,isacov=True)
res[0] = 1
p = np.arange(pmax+1)
gic3 = niter_tot*np.log(res)+3*p
p_sel = np.argmin(gic3)
ar_sel = phi[:p_sel+1,p_sel]
sigma2_eps_norm = res[p_sel]
alpha = sigma2_eps_norm/(1-ar_sel.sum())**2
n_eff = niter*chains/alpha
Most of the computations are done in levinson_durbin
, < 20 lines of Python code. The AR estimation algorithm, also known as the Yule-Walker algorithm, is also available in R.
For completeness, here is the Python code for computation of the autocorrelation starting from the samples s
in an niter
x chains
numpy array:
import numpy as np
sd = s-s.mean()
acov_sample = np.array([acovf(x,demean=False,fft=True) for x in sd.T]).mean(axis=0)
acor_sample = acov_sample/acov_sample[0]
Even though levinson_durbin
above is implemented in pure Python, execution time is not too bad: I clocked 0.6 ms for the autocorrelation computation, and 4.4 ms for Levinson-Durbin. I expect a 5-10x speedup when converting to C++. If pressed for computational resources, one may consider reducing the maximum order (100). For example, using 50 instead reduces computation time to 1 ms. Please note that the model order does not correspond to the correlation length; even an AR(1) model can have an arbitrary large correlation length.