I may have stumbled upon a scenario that falsely triggers the Tail ESS warning. Here’s some R code to create fake data for a logistic regression:
library(rstan)
set.seed(0)
B0 <- 1
B1 <- 0.5
N <- 100
x <- rnorm(N)
y <- rbinom(N, 1, exp(B0 + B1*x)/(exp(B0 + B1*x) + 1))
I use the following model, which includes sample bernoulli draws (y_rep
) for posterior predictive checks:
data {
int<lower = 0> N;
vector [N] x;
int<lower = 0> y[N];
}
parameters {
real B0;
real B1;
}
model {
y ~ bernoulli_logit(B0 + B1*x);
}
generated quantities {
int<lower = 0> y_rep[N];
for(n in 1:N){
y_rep[n] = bernoulli_logit_rng(B0 + B1*x[n]);
}
}
And I run the model:
dat <- list(N = N, x = x, y = y)
fit <- stan(model_code = stanmodelcode, data = dat, seed = 0)
I get the familiar warning message about Tail ESS:
Warning message:
Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
When I check n_eff
in the summary of the fit
and Tail_ESS
using monitor
, all of the ESS and Rhat numbers are robust. And when I get rid of the generated quantities block, the model runs fine with no warnings.
I’m aware that a constant value, either as a parameter or in the generated quantities block, will mess up the Rhat and ESS calculations, triggering a warning. But none of the y_rep
variables are all 0s or 1s.
My guess is that since y_rep
is binary, certain diagnostics can’t be calculated, such as MCSE_Q50
or the MCSE_Q75
, which are NA
in the monitor
function. Note that running ess_tail
on any of the y_rep
indices also produces NA
.