Low EBFMI examples

Can somebody suggest a good example (or several) of a model that runs without divergences but yields low EBFMI? Data must be simulated or otherwise shareable. Ideally, the model would also yield decent r-hat for all parameters, but this isn’t crucial. Bonus points for models that are maximally simple to understand and fast to fit.

3 Likes
parameters {
  real x;
}
model {
  x ~ student_t(.1,0,1);
}

yields no divergent transitions with adapt_delta=.99 and an E-BFMI of roughly .1, but a lot of maximum treedepth and bad R-hat.

3 Likes

You can tweak a funnel to manifest E-FMI problems without divergences (or with only a few depending on the precise details). See for example the nine-dimensional example in Hierarchical Modeling. The higher the dimension of the funnel the more the E-FMI problem dominates over divergences, but be careful because for high-enough dimensions the chains might not even try to explore small enough values for the empirical E-FMI diagnostic to notice (see for example the difference between chains 1/2 and chains 3/4 above).

In lower-dimensions a heavy-tailed target also runs into E-FMI problems. See for example Section 4.2 of the original E-FMI paper, [1604.00695] Diagnosing Suboptimal Cotangent Disintegrations in Hamiltonian Monte Carlo. That said when fitting these models you have to be careful to turn off the inverse metric adaptation because the variances that the adaptation is trying to estimate will no longer exist.

5 Likes

I’ve noticed that if I start with a funnel that yields low E-FMI and no divergences and then add a big multivariate normal to the model (fixed sd, no funnel), the E-FMI problems disappear, but divergences start popping up. If I then start increasing adapt_delta, I can squash the divergences, but before I do usually I start to see E-FMI problems again. The complementarity of the E-FMI and divergences as diagnostics is not something that I had previously appreciated (other than in the vaguest sense of “both metrics are useful”) and is really cool!

data {
  int<lower=0> K;
  int<lower=0> N;
}

parameters {
  real<lower=0> tau;
  real phi[K];
  vector[N] y;
}

model {
 tau ~ normal(0, 5);
 phi ~ normal(0, tau);
 y ~ std_normal();
}
library(cmdstanr)
fmod <- cmdstan_model("funnel2.stan")

set.seed(123)

data <- list("K" = 20, "N" = 0)
a <- fmod$sample(data = data)
max(a$summary()$rhat)

data <- list("K" = 20, "N" = 80)
b <- fmod$sample(data = data)
max(b$summary()$rhat)

data <- list("K" = 20, "N" = 80)
d <- fmod$sample(data = data, adapt_delta = .99, max_treedepth = 10)
max(d$summary()$rhat)
2 Likes

Because the y are independent from the funnel parameters tau and phi this is almost surely due to adaptation which in some sense couples the parameters even if they’re probabilistically independent.

The standard normal target for the y motivates a much higher step size than the funnel parameters, and overall this leads to a more aggressive adaptation. The more aggressive adaptation then leads to larger step sizes which limit how deeply the numerical Hamiltonian trajectories can venture into the funnel before diverging. The unstable trajectories explain the divergences while the limited exploration explains why the E-FMI warning doesn’t show up. Increasing adapt_delta leads to a less aggressive adaptation and smaller step size which allows for more refined exploration that will see enough of the funnel to resolve the E-FMI problem.

I mentioned above all of these empirical diagnostics are only as good as our initial exploration (diagnostics that are not transparent about this limitation can be particularly dangerous).

2 Likes

Apologies for bumping an old thread. But, how do we turn off the inverse metric adaptation?

Stan’s warmup adaptation proceeds in three states – an initial window, a series of expanding windows, and then a final window. The metric adaptation occurs during those intermediate windows, so you can turn the adaptation off by removing the windows entirely; at that point the initial and final windows are redundant so you would only want to keep one of them. In CmdStan you would do something like

./<model_name> sample num_warmup=N num_samples=1000 adapt init_buffer=N window=0 term_buffer=0

where N might be something like 100.

2 Likes