Bernoulli variables in generated quantities causing tail ESS warning?

I may have stumbled upon a scenario that falsely triggers the Tail ESS warning. Here’s some R code to create fake data for a logistic regression:

library(rstan)

set.seed(0)
B0 <- 1
B1 <- 0.5
N <- 100

x <- rnorm(N)
y <- rbinom(N, 1, exp(B0 + B1*x)/(exp(B0 + B1*x) + 1))

I use the following model, which includes sample bernoulli draws (y_rep) for posterior predictive checks:

data {
    int<lower = 0> N;
    vector [N] x;
    int<lower = 0> y[N];
  }
  parameters {
    real B0;
    real B1;
  }
  model {
    y ~ bernoulli_logit(B0 + B1*x);
  }
  generated quantities {
    int<lower = 0> y_rep[N];
    for(n in 1:N){
      y_rep[n] = bernoulli_logit_rng(B0 + B1*x[n]);
    }
  }

And I run the model:

dat <- list(N = N, x = x, y = y)
fit <- stan(model_code = stanmodelcode, data = dat, seed = 0) 

I get the familiar warning message about Tail ESS:

Warning message:
Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.

When I check n_eff in the summary of the fit and Tail_ESS using monitor, all of the ESS and Rhat numbers are robust. And when I get rid of the generated quantities block, the model runs fine with no warnings.

I’m aware that a constant value, either as a parameter or in the generated quantities block, will mess up the Rhat and ESS calculations, triggering a warning. But none of the y_rep variables are all 0s or 1s.

My guess is that since y_rep is binary, certain diagnostics can’t be calculated, such as MCSE_Q50 or the MCSE_Q75, which are NA in the monitor function. Note that running ess_tail on any of the y_rep indices also produces NA.

1 Like

Hi,
Thanks for clear code example. I don’t get any warnings with rstan_2.21.2.
What version are you using?

Yes this is correct. Tail-ESS diagnoses sequences

I <- theta <= quantile(theta, 0.05)

and

I <- theta <= quantile(theta, 0.95)

For binary data, it’s likely that quantile(theta, 0.95) is 1 and then the corresponding I is constant and ESS returns NA. ess_tail function returns minimum of 0.05 and 0.95 tail-ESSs, which is then NA.

Quantiles are not very useful for binary variable (there is no tail), so I would think that NA for tail-ESS is fine. We may want to think what to do for other discrete variables with a small number of observed states.

1 Like

I’m using rstan 2.19.3

That explains the difference in warnings as we fixed the the later version to not warn about NAs.

2 Likes