N_eff BDA3 vs. Stan

avehtari · December 29, 2017, 7:17pm

Here are experimental results comparing the current N_eff computation to a version using Geyer’s initial positive sequence. For three first examples the true parameter values are known and for 8 schools example true values were estimated using 4 million draws. The reference result for sampling efficiency is obtained by repeating sampling 100-1000 times. Each repeat used 4 chains with total of 4000 draws saved after warm-up.

Normal distribution with diagonal covariance and increasing number of dimensions

data {
  int<lower=0> D;           // number of dimensions
}
parameters {
  vector[D] mu; 
}
model {
  mu ~ normal(0, 1);
}

normal_neff_geyer_ipse
NUTS can be superefficient (antithetic Markov chain), but the current estimate caps N_eff<=N.

Normal distribution with diagonal covariance, constraint to all positive values and increasing number of dimensions. Due to the constraint the posterior distribution is not Gaussian.

data {
  int<lower=0> D;           // number of dimensions
}
parameters {
  vector<lower=0>[D] mu; 
}
model {
  mu ~ normal(0, 1);
}

normalplus_neff_geyer_ipse
Again the current estimate caps N_eff<=N, while true N_eff can be larger than N.

Normal distribution with diagonal values set to 1 and all off-diagonal values set to rho with varying rho, and the number of dimensions D=16 or D=64.

data {
  int<lower=0> D;           // number of dimensions
  real<lower=0,upper=1> s;  // correlation
}
transformed data {
  vector[D] zero = rep_vector(0.0, D);
  cov_matrix[D] S;
  cholesky_factor_cov[D] L;
  for (i in 1:D) {
    for (j in 1:(i-1)) {
      S[i,j]=s;
      S[j,i]=s;
    }
    S[i,i]=1.0;
  }
  L = cholesky_decompose(S);
}
parameters {
  vector[D] mu;
}
model {
  mu ~ multi_normal_cholesky(zero, L);
}

normalrho_neff_geyer_ipse
Upper lines (with higher N_eff’s) are for D=16 and bottom lines (with lower N_eff’s) are for D=32. Again the current estimate caps N_eff<=N, while true N_eff can be larger than N. In addition we see that the current N_eff estimate can also overestimate N_eff. This is likely due to truncating autocorrelation series too early.

FInally 8 schools with non-centered parametrization

data {
  int<lower=0> J;           // number of schools 
  vector[J] y;              // estimated treatment effects
  vector<lower=0>[J] sigma; // s.e. of effect estimates 
}
parameters {
  real mu; 
  real<lower=0> tau;
  vector[J] eta;
}
transformed parameters {
  vector[J] theta;
  theta = mu + tau * eta;
}
model {
  eta ~ normal(0, 1);
  tau ~ normal(0, 5);
  y ~ normal(theta, sigma);
}

8schools_neff_geyer_ipse
The difference between the estimate from repeated simulations and the estimate using Geyer’s initial positive is mostly due to finite number of repeated Stan runs used.

It’s likely that in most practical cases N_eff is rarely much higher than N, but all these results show that N_eff estimated using Geyer’s initial positive sequence is more accurate.

I also did run the same experiments with Geyer’s initial monotone sequence estimator. The results are really close to the results with initial positive sequence estimator (as in my old experiments years ago, too).

Topic		Replies	Views
Is the effective sample size estimator using the variogram in BDA3 always valid? General loo , model-comparison , time-series	4	2397	March 28, 2022
Stan for Bayesian Hierarchical Models Publicity	6	1401	July 20, 2018
Improving estimates for fast CAR parameters Modeling	2	453	September 9, 2019
Meaning of NaN for se_mean, n_eff and Rhat in Stan output General	2	1889	January 20, 2021
Stan's ESS (and monotone autocorrelations) Algorithms mcmc	5	897	March 13, 2020

N_eff BDA3 vs. Stan

Related topics