How can I solve "bfmi-low" problem?

When I fit a model I get the following warnings:

There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems

I tried increasing the number of iterations to 5000, the warmup to 3000, the stepsize to 0.01 the adapt_delta to 0.99, and the max_treedepth to 18. Alas, the problem persists. This is my model:

data {
  int<lower=0> J;             // number of practices 
  real y[J];                  // estimated  effect
  int<lower=0> n[J];          // number of Medicare beneficiaries 
}

parameters {
  real mu;                    // mean for practice
  real<lower=0> nu;           // 
  real<lower=0> tau;          // sd between practices
  vector[J] eta;
}

transformed parameters {
  vector[J] theta; 
  vector<lower = 0>[J] sigma;
  theta = mu + tau * eta;
  for (j in 1:J) sigma[j] = nu/sqrt(n[j]);
}

model {
  eta ~ normal(0, 1);
  y ~ normal(theta, sigma);
}

and this is the pairs plot:

Capture

I have a few questions:

  1. What can I learn about my model from this pairs plot?

  2. Do you have any suggestions about how to solve the bfmi-low problem?

  3. Is this a problem that can be ignored? The documentation says “…chains likely did not explore the posterior distribution efficiently”. This to me sounds like the algorithm was slow (inefficient) but ultimately got to where it needed to get.

  4. What is the energy__ parameter and how do I plot it?

1 Like

Are you working via rstan in R? If so, I asked myself question 4 recently. The answer in R seems to be to use (assuming that your stanfit object is called “sfit”) something like the following:

param_tmp ← as.data.frame(extract(sfit, permuted=FALSE))
samppars_tmp ← get_sampler_params(sfit, inc_warmup = FALSE)
chain ← 1
param ← “tau”
plot(param_tmp[,paste(“chain:”,chain,“.”,param,sep=“”)],
samppars_tmp[[chain]][,“energy__”],
col=epcols[chain], pch=16, xlab=param, ylab=“Energy”, cex.lab=1.5)

I hope that’s helpeful.

The problem for solving a bfmi-low problem for me has usually been to somehow re-parameterize the model. Thus, have you also tried a centered parameterization? I.e. going directly for theta ~ normal(mu, tau); ? Or what if you assign some weakly informative priors to regularize the model a bit?

I believe Michael Betancourt commented on the severity of the diagnostic in a previous thread. He’s also got a paper on the topic on arXiv ([1604.00695] Diagnosing Suboptimal Cotangent Disintegrations in Hamiltonian Monte Carlo). After reading all this, I feel like it is a warning I should not ignore (but I’ve not run into a problem, yet, where I cannot get rid of it).

1 Like

If you are using R I have created an R function that allows you to obtain a nice pair plot that includes the energy.

pairs_stan <- function(chain, stan_model, pars) {
  energy <- as.matrix(sapply(get_sampler_params(stan_model, inc_warmup = F), 
                             function(x) x[,"energy__"]))
  pars <- extract(stan_model, pars = pars, permuted = F)
  df <- data.frame(energy[,chain], pars[,chain,])
  names(df)[1] <- "energy"
  GGally::ggpairs(df, title = paste0("Chain", chain), 
                  lower = list(continuous = GGally::wrap("points", alpha = 0.2)))                    
}
  • chain the chain that you want to plot
  • stan_model the stan fit object
  • pars a character vector containing the names of parameters that you want to plot together with energy

Please not that you will need to install the package GGally.

I hope you will find it useful.

Cheers,

Claudio

3 Likes

I think your function may have a bug. When i try to use it says:

Error in data.frame(energy[, x], pars[, x, ]) : object 'x' not found

Where do you define epcols?

You have weak identifiability of nu and tau and bimodal posterior.

Add priors for nu and tau.

I’m worried about the second mode not explored well enough, but then it’s possible that you don’t want that mode.

1 Like

I just tried the following modification:

data {
  int<lower=0> J;             // number of practices 
  real y[J];                  // estimated  effect
  int<lower=0> n[J];          // number of Medicare beneficiaries 
}

parameters {
  real mu;                    // mean for practice
  real<lower=0> nu;           // 
  real<lower=0> tau;          // sd between practices
  vector[J] theta;
}

transformed parameters {
  vector<lower = 0>[J] sigma;
  for (j in 1:J) sigma[j] = nu/sqrt(n[j]);
}

model {
  theta ~ normal(mu, tau);
  y ~ normal(theta, sigma);
}

And I get the same warning:

There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems

@Bjoern could you elaborate a bit more on your weakly informative priors suggestion? What do you have in mind?

Thanks a lot, @avehtari. Do you have a suggestion about which weak priors I should use?

E.g. replace that with col=rgb(0,0,1,0.2).

I actually copy-pasted from a larger function I wrote for myself:

energyplot ← function(sfit, param=“logkappa”,chains=4, transp=0.2){
param_tmp ← as.data.frame(extract(sfit, permuted=FALSE))
samppars_tmp ← get_sampler_params(sfit, inc_warmup = FALSE)
epcols ← c(rgb(0,0,1,transp), rgb(1,0,0,transp), rgb(0,1,0,transp), rgb(1,1,0,transp), rgb(1,0,1,transp), rgb(0,1,1,transp) )
for (chain in 1:chains) {
if (chain==1) {
plot(param_tmp[,paste(“chain:”,chain,“.”,param,sep=“”)],
samppars_tmp[[chain]][,“energy__”],
col=epcols[chain], pch=16, xlab=param, ylab=“Energy”, cex.lab=1.5)
} else {
points( param_tmp[, paste(“chain:”,chain,“.”,param,sep=“”)],
samppars_tmp[[chain]][,“energy__”],
col=epcols[chain], pch=16)
}}
return()
}

Ops, sorry, I copied the wrong version. I have updated it, it should be working now :)

1 Like

Something with lighter tail than Cauchy. You could start by testing what happens with half-normals on nu and tau, just to see if adding priors make any difference.

Aki

I tried normal(0,10) and normal(0,1), but the problem does not go away.

Those are really wide compared to the original posterior. Right now I’m not worried that you would use too informative prior, now I just want to know what happens if you have informative prior. If that helps with the computation then you can start thinking harder about the priors or accept that sampling is going to be slow. Also you should think anyway whether that another smaller mode is plausible.

I tried many things, including the following:

iter <- 5000
warmup <- 2500
data {
  int<lower=0> J;             // number of practices 
  real y[J];                  // estimated  effect
  int<lower=0> n[J];          // number of Medicare beneficiaries 
}

parameters {
  real mu;                    // mean for practice
  real<lower=0> nu;           // 
  real<lower=0> tau;          // sd between practices
  vector[J] eta;
}

transformed parameters {
  vector[J] theta; 
  vector<lower = 0>[J] sigma;
  theta = mu + tau * eta;
  for (j in 1:J) sigma[j] = nu/sqrt(n[j]);
}

model {
  eta ~ normal(0, 1);
  nu ~ normal(2,0.5);
  tau ~ normal(0.1,0.01);
  y ~ normal(theta, sigma);
}
fit1 <- sampling(model_1, data = practices_dat, chains = 4,
                 show_messages = T, refresh = 500, cores=cores, 
                 control = list(adapt_delta = 0.99, stepsize = 0.01, max_treedepth = 18),
                 iter = iter, 
                 warmup = warmup) 

pairs_stan(chain=1, stan_model=fit1, pars=c("mu", "nu", "tau", "lp__"))

Capture

Alas, the error persists:

There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems

Looks better

It’s not an error. It’s a warning, and this one and the warning limit we don’t understand yet so well. Maybe @betanalpha could help?

Try use a bivariate log-normal between tau and nu, specify
rho = - inv_logit(rho_raw)
rho_raw ~ normal(0, 2.0);

  real bilognormal_lpdf(vector x, vector mu, vector sig, real rho) {
    real z1 = (log(x[1]) - mu[1]) / sig[1];
    real z2 = (log(x[2]) - mu[2]) / sig[2];
    real invsquarerho = 1 - square(rho);

    return  exp( - (square(z1) + square(z2) - 2 * rho * z1 * z2) / (2.0 * invsquarerho))
     / (2.0 * pi() * x[1] * x[2] * sig[1] * sig[2] * sqrt(invsquarerho));
  }

The E-BFMI (really should be E-FMI) quantifies how well the momentum resampling in HMC works. Low values indicate that the sampler is not able to explore all of the relevant energy sets fast enough which may manifest as bias in the exploration of the parameter space.

For manipulating the energy diagnostic take a look at https://github.com/betanalpha/knitr_case_studies/blob/master/fitting_the_cauchy/stan_utility.R. In particular, run check_energy to see exactly what the E-FMI values are – RStan warns for values below 0.3 but we have since seen that dynamic HMC works reasonably well for values as low as 0.2.

If you still see E-FMI values lower than 0.2 then you’ll have to consider reparameterizations/new priors. In particular, in many cases low E-FMI seems to indicate really poor identifiability of the likelihood which will require careful prior choice to ensure a well-behaved joint model.

8 Likes

@andre.pfeuffer could you show me how to modify my stan program to incorporate your solution? I have never done something like this before. Thakns!

functions {
  real bilognormal_lpdf(vector x, vector mu, vector sig, real rho) {
    real z1 = (log(x[1]) - mu[1]) / sig[1];
    real z2 = (log(x[2]) - mu[2]) / sig[2];
    real invsquarerho = 1 - square(rho);

    return  exp( - (square(z1) + square(z2) - 2 * rho * z1 * z2) / (2.0 * invsquarerho))
     / (2.0 * pi() * x[1] * x[2] * sig[1] * sig[2] * sqrt(invsquarerho));
  }
}

data {
  int<lower=0> J;             // number of practices 
  real y[J];                  // estimated  effect
  int<lower=0> n[J];          // number of Medicare beneficiaries 
}

parameters {
  real mu;                    // mean for practice
  vector<lower=0>[2] nu_tau;
  real rho_raw;
  vector[J] eta;
}

transformed parameters {
  real<lower=-1, upper=0>rho = - inv_logit(rho_raw);
  real<lower=0> nu = nu_tau[1];
  real<lower=0> tau = nu_tau[2];     // sd between practices
  vector[J] theta; 
  vector<lower = 0>[J] sigma;
  theta = mu + tau * eta;
  for (j in 1:J) sigma[j] = nu/sqrt(n[j]);
}

model {
  eta ~ normal(0, 1);
  rho_raw ~ normal(0, 2);
  target += bilognormal_lpdf(nu_tau, [0.57, -2.3]', [0.2462207, 1.26]', rho);
  y ~ normal(theta, sigma); 
}

Calculations, regarding mu and sigma in bilognormal_lpdf (Hope I got it right)
based upon:

log(2) - 0.5^2/2
[1] 0.5681472
log(.1) - 0.05^2/2
[1] -2.303835
log(1+0.5^2/2^2)^0.5
[1] 0.2462207
log(1+0.1^2/0.05^2)^0.5
[1] 1.268636

Thanks @andre.pfeuffer . I’m getting this error when I try to compile the model:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

Probabilty functions with suffixes _lpdf, _lpmf, _lcdf, and _lccdf,
require a vertical bar (|) between the first two arguments.
  error in 'model27ebc53ae3cc1_stan_27ebc2609667' at line 39, column 36
  -------------------------------------------------
    37:   rho_raw ~ normal(0, 2);
    38:   // see (6), (7) in http://www.academicjournals.org/article/article1380633488_Yerel%20and%20Konuk.pdf
    39:   target += bilognormal_lpdf(nu_tau, [0.57, -2.3]', [0.2462207, 1.26]', rho);
                                           ^
    40:   y ~ normal(theta, sigma); 
  -------------------------------------------------

PARSER EXPECTED: "|"
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'stan-27ebc2609667' due to the above error.
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'stan-27ebc2609667' due to the above error.

All I did was copy and paste… :-(

1 Like