Effective sample size discrepancy between loo and rstan packages

Apologies if this topic is in the wrong place.

I’ve been using a simple toy model to experiment with the rstan and loo packages (most current verison of each, as far as I’m aware), and I noticed what appears to be a discrepancy between the relative effective sample sizes I get from the summary method in rstan and the relative_eff function in loo. The toy model is as follows.

data {
  int<lower=0> N;
  vector[N] y;
}

parameters {
  real mu;
  real<lower=0> sigma;
}

model {
  y ~ normal(mu, sigma);
  mu ~ normal(-1, 1);
  sigma ~ cauchy(0, 1);
}

generated quantities{
  real lik[N];
  for(i in 1:N){
    lik[i] = exp(normal_lpdf(y[i] | mu, sigma));
  }
}

The data is a small draw from rnorm. Note that I am directly generating the likelihoods by exponentiating the logs in the generated quantities block.

Creating the model and extracting the relative effective sample sizes from the stanfit method gives me one set of results:

stan_runs = stan(file = 'code/smallstantest.stan', data = list(y = y, N = N), verbose = TRUE, iter = 500, save_warmup=FALSE)
summary(stan_runs, pars = c('lik'))$summary[,'n_eff']/1000 #250 iterations * 4 chains = 1000 iterations

 0.4850813   0.7350164   0.5353286   0.7060084   0.7548567   0.7030208   0.8220460   0.5953139   0.4300723   0.7909692

But using the provided function from loo gives me a slightly different (but somewhat similar) results:

relative_eff(extract(stan_runs, pars = c('lik'), permuted=FALSE))

0.4685559 0.7308909 0.5181340 0.7051706 0.7429508 0.7117508 0.7870215 0.5970975 0.4180794 0.7728746

What is the loo package doing differently? Looking through the source code didn’t reveal any obvious differences in methods. Is it some kind of warmup thing I’m missing? Perhaps something to do with permutation of chains? And to the extent that there is a difference, which is “correct”?

Any help is greatly appreciated.

1 Like

Update: there appear to be two differences:

  1. The autocovariance in the rstan function is calculated using an FFT, while in the loo package it simply uses the acf function from rstats.

  2. The rstan function (https://github.com/stan-dev/rstan/blob/develop/rstan/rstan/R/monitor.R) uses Geyer’s initial sequences to sum up the autocorrelations, while the loo function (https://github.com/stan-dev/loo/blob/master/R/effective_sample_sizes.R) does not.

Is there anyone experienced with both of these packages who can share some insight on the reasons for these different methods, and the impact of those differences?

Did you have a look at this thread?

Thanks for the report @ShaunMcD !

This is my fault. We have an issue for this https://github.com/stan-dev/loo/issues/85, but I forgot to fix this after this was fixed in monitor.R. I’ll fix this tomorrow by copying the updated code from monitor.R, but it will take some time before the next CRAN release of loo package (we should fix some other open issues, too).

There’s a (longish) discussion of the differences in N_eff BDA3 vs. Stan

1 Like

Many thanks for the clarification to both of you!
I see that the relevant pull request has now been merged. :)