Low EBFMI examples

jsocolar · May 24, 2021, 8:34pm

Can somebody suggest a good example (or several) of a model that runs without divergences but yields low EBFMI? Data must be simulated or otherwise shareable. Ideally, the model would also yield decent r-hat for all parameters, but this isn’t crucial. Bonus points for models that are maximally simple to understand and fast to fit.

Funko_Unko · May 25, 2021, 8:29am

parameters {
  real x;
}
model {
  x ~ student_t(.1,0,1);
}

yields no divergent transitions with adapt_delta=.99 and an E-BFMI of roughly .1, but a lot of maximum treedepth and bad R-hat.

betanalpha · June 14, 2021, 6:52pm

You can tweak a funnel to manifest E-FMI problems without divergences (or with only a few depending on the precise details). See for example the nine-dimensional example in Hierarchical Modeling. The higher the dimension of the funnel the more the E-FMI problem dominates over divergences, but be careful because for high-enough dimensions the chains might not even try to explore small enough values for the empirical E-FMI diagnostic to notice (see for example the difference between chains 1/2 and chains 3/4 above).

In lower-dimensions a heavy-tailed target also runs into E-FMI problems. See for example Section 4.2 of the original E-FMI paper, [1604.00695] Diagnosing Suboptimal Cotangent Disintegrations in Hamiltonian Monte Carlo. That said when fitting these models you have to be careful to turn off the inverse metric adaptation because the variances that the adaptation is trying to estimate will no longer exist.

jsocolar · June 16, 2021, 5:13pm

I’ve noticed that if I start with a funnel that yields low E-FMI and no divergences and then add a big multivariate normal to the model (fixed sd, no funnel), the E-FMI problems disappear, but divergences start popping up. If I then start increasing adapt_delta, I can squash the divergences, but before I do usually I start to see E-FMI problems again. The complementarity of the E-FMI and divergences as diagnostics is not something that I had previously appreciated (other than in the vaguest sense of “both metrics are useful”) and is really cool!

data {
  int<lower=0> K;
  int<lower=0> N;
}

parameters {
  real<lower=0> tau;
  real phi[K];
  vector[N] y;
}

model {
 tau ~ normal(0, 5);
 phi ~ normal(0, tau);
 y ~ std_normal();
}

library(cmdstanr)
fmod <- cmdstan_model("funnel2.stan")

set.seed(123)

data <- list("K" = 20, "N" = 0)
a <- fmod$sample(data = data)
max(a$summary()$rhat)

data <- list("K" = 20, "N" = 80)
b <- fmod$sample(data = data)
max(b$summary()$rhat)

data <- list("K" = 20, "N" = 80)
d <- fmod$sample(data = data, adapt_delta = .99, max_treedepth = 10)
max(d$summary()$rhat)

betanalpha · June 28, 2021, 2:41pm

Because the y are independent from the funnel parameters tau and phi this is almost surely due to adaptation which in some sense couples the parameters even if they’re probabilistically independent.

The standard normal target for the y motivates a much higher step size than the funnel parameters, and overall this leads to a more aggressive adaptation. The more aggressive adaptation then leads to larger step sizes which limit how deeply the numerical Hamiltonian trajectories can venture into the funnel before diverging. The unstable trajectories explain the divergences while the limited exploration explains why the E-FMI warning doesn’t show up. Increasing adapt_delta leads to a less aggressive adaptation and smaller step size which allows for more refined exploration that will see enough of the funnel to resolve the E-FMI problem.

I mentioned above all of these empirical diagnostics are only as good as our initial exploration (diagnostics that are not transparent about this limitation can be particularly dangerous).

JLC · October 14, 2022, 8:16pm

Apologies for bumping an old thread. But, how do we turn off the inverse metric adaptation?

betanalpha · November 7, 2022, 6:02pm

Stan’s warmup adaptation proceeds in three states – an initial window, a series of expanding windows, and then a final window. The metric adaptation occurs during those intermediate windows, so you can turn the adaptation off by removing the windows entirely; at that point the initial and final windows are redundant so you would only want to keep one of them. In CmdStan you would do something like

./<model_name> sample num_warmup=N num_samples=1000 adapt init_buffer=N window=0 term_buffer=0

where N might be something like 100.

Topic		Replies	Views
Understanding the cause of divergent transitions (no apparent funnel behavior) Modeling	4	977	January 22, 2020
Fixing "BFMI was low" problems Modeling fitting-issues , performance	11	3467	December 21, 2022
Divergences in a non-centered computational model Modeling fitting-issues	21	1186	October 30, 2019
Low E-BFMI for Bayesian correlation model Modeling fitting-issues	0	150	August 23, 2023
Multilevel model implementation - tricks to solve low neff, low E-BFMI, mildly high Rhat? Modeling techniques , fitting-issues , specification	15	761	June 26, 2023

Low EBFMI examples

Related Topics