Here are experimental results comparing the current N_eff computation to a version using Geyer’s initial positive sequence. For three first examples the true parameter values are known and for 8 schools example true values were estimated using 4 million draws. The reference result for sampling efficiency is obtained by repeating sampling 100-1000 times. Each repeat used 4 chains with total of 4000 draws saved after warm-up.
Normal distribution with diagonal covariance and increasing number of dimensions
data {
int<lower=0> D; // number of dimensions
}
parameters {
vector[D] mu;
}
model {
mu ~ normal(0, 1);
}
NUTS can be superefficient (antithetic Markov chain), but the current estimate caps N_eff<=N.
Normal distribution with diagonal covariance, constraint to all positive values and increasing number of dimensions. Due to the constraint the posterior distribution is not Gaussian.
data {
int<lower=0> D; // number of dimensions
}
parameters {
vector<lower=0>[D] mu;
}
model {
mu ~ normal(0, 1);
}
Again the current estimate caps N_eff<=N, while true N_eff can be larger than N.
Normal distribution with diagonal values set to 1 and all off-diagonal values set to rho with varying rho, and the number of dimensions D=16 or D=64.
data {
int<lower=0> D; // number of dimensions
real<lower=0,upper=1> s; // correlation
}
transformed data {
vector[D] zero = rep_vector(0.0, D);
cov_matrix[D] S;
cholesky_factor_cov[D] L;
for (i in 1:D) {
for (j in 1:(i-1)) {
S[i,j]=s;
S[j,i]=s;
}
S[i,i]=1.0;
}
L = cholesky_decompose(S);
}
parameters {
vector[D] mu;
}
model {
mu ~ multi_normal_cholesky(zero, L);
}
Upper lines (with higher N_eff’s) are for D=16 and bottom lines (with lower N_eff’s) are for D=32. Again the current estimate caps N_eff<=N, while true N_eff can be larger than N. In addition we see that the current N_eff estimate can also overestimate N_eff. This is likely due to truncating autocorrelation series too early.
FInally 8 schools with non-centered parametrization
data {
int<lower=0> J; // number of schools
vector[J] y; // estimated treatment effects
vector<lower=0>[J] sigma; // s.e. of effect estimates
}
parameters {
real mu;
real<lower=0> tau;
vector[J] eta;
}
transformed parameters {
vector[J] theta;
theta = mu + tau * eta;
}
model {
eta ~ normal(0, 1);
tau ~ normal(0, 5);
y ~ normal(theta, sigma);
}
The difference between the estimate from repeated simulations and the estimate using Geyer’s initial positive is mostly due to finite number of repeated Stan runs used.
It’s likely that in most practical cases N_eff is rarely much higher than N, but all these results show that N_eff estimated using Geyer’s initial positive sequence is more accurate.
I also did run the same experiments with Geyer’s initial monotone sequence estimator. The results are really close to the results with initial positive sequence estimator (as in my old experiments years ago, too).