How can I solve "bfmi-low" problem?

ignacio · January 17, 2018, 2:21pm

When I fit a model I get the following warnings:

There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems

I tried increasing the number of iterations to 5000, the warmup to 3000, the stepsize to 0.01 the adapt_delta to 0.99, and the max_treedepth to 18. Alas, the problem persists. This is my model:

data {
  int<lower=0> J;             // number of practices 
  real y[J];                  // estimated  effect
  int<lower=0> n[J];          // number of Medicare beneficiaries 
}

parameters {
  real mu;                    // mean for practice
  real<lower=0> nu;           // 
  real<lower=0> tau;          // sd between practices
  vector[J] eta;
}

transformed parameters {
  vector[J] theta; 
  vector<lower = 0>[J] sigma;
  theta = mu + tau * eta;
  for (j in 1:J) sigma[j] = nu/sqrt(n[j]);
}

model {
  eta ~ normal(0, 1);
  y ~ normal(theta, sigma);
}

and this is the pairs plot:

Capture

I have a few questions:

What can I learn about my model from this pairs plot?
Do you have any suggestions about how to solve the bfmi-low problem?
Is this a problem that can be ignored? The documentation says “…chains likely did not explore the posterior distribution efficiently”. This to me sounds like the algorithm was slow (inefficient) but ultimately got to where it needed to get.
What is the energy__ parameter and how do I plot it?

Bjoern · January 17, 2018, 3:43pm

Are you working via rstan in R? If so, I asked myself question 4 recently. The answer in R seems to be to use (assuming that your stanfit object is called “sfit”) something like the following:

param_tmp ← as.data.frame(extract(sfit, permuted=FALSE))
samppars_tmp ← get_sampler_params(sfit, inc_warmup = FALSE)
chain ← 1
param ← “tau”
plot(param_tmp[,paste(“chain:”,chain,“.”,param,sep=“”)],
samppars_tmp[[chain]][,“energy__”],
col=epcols[chain], pch=16, xlab=param, ylab=“Energy”, cex.lab=1.5)

I hope that’s helpeful.

The problem for solving a bfmi-low problem for me has usually been to somehow re-parameterize the model. Thus, have you also tried a centered parameterization? I.e. going directly for theta ~ normal(mu, tau); ? Or what if you assign some weakly informative priors to regularize the model a bit?

I believe Michael Betancourt commented on the severity of the diagnostic in a previous thread. He’s also got a paper on the topic on arXiv ([1604.00695] Diagnosing Suboptimal Cotangent Disintegrations in Hamiltonian Monte Carlo). After reading all this, I feel like it is a warning I should not ignore (but I’ve not run into a problem, yet, where I cannot get rid of it).

claudiofronterre · January 17, 2018, 4:00pm

If you are using R I have created an R function that allows you to obtain a nice pair plot that includes the energy.

pairs_stan <- function(chain, stan_model, pars) {
  energy <- as.matrix(sapply(get_sampler_params(stan_model, inc_warmup = F), 
                             function(x) x[,"energy__"]))
  pars <- extract(stan_model, pars = pars, permuted = F)
  df <- data.frame(energy[,chain], pars[,chain,])
  names(df)[1] <- "energy"
  GGally::ggpairs(df, title = paste0("Chain", chain), 
                  lower = list(continuous = GGally::wrap("points", alpha = 0.2)))                    
}

chain the chain that you want to plot
stan_model the stan fit object
pars a character vector containing the names of parameters that you want to plot together with energy

Please not that you will need to install the package GGally.

I hope you will find it useful.

Cheers,

Claudio

ignacio · January 17, 2018, 4:09pm

I think your function may have a bug. When i try to use it says:

Error in data.frame(energy[, x], pars[, x, ]) : object 'x' not found

ignacio · January 17, 2018, 4:11pm

Where do you define epcols?

avehtari · January 17, 2018, 4:16pm

You have weak identifiability of nu and tau and bimodal posterior.

Add priors for nu and tau.

I’m worried about the second mode not explored well enough, but then it’s possible that you don’t want that mode.

ignacio · January 17, 2018, 4:17pm

I just tried the following modification:

data {
  int<lower=0> J;             // number of practices 
  real y[J];                  // estimated  effect
  int<lower=0> n[J];          // number of Medicare beneficiaries 
}

parameters {
  real mu;                    // mean for practice
  real<lower=0> nu;           // 
  real<lower=0> tau;          // sd between practices
  vector[J] theta;
}

transformed parameters {
  vector<lower = 0>[J] sigma;
  for (j in 1:J) sigma[j] = nu/sqrt(n[j]);
}

model {
  theta ~ normal(mu, tau);
  y ~ normal(theta, sigma);
}

And I get the same warning:

There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems

@Bjoern could you elaborate a bit more on your weakly informative priors suggestion? What do you have in mind?

ignacio · January 17, 2018, 4:20pm

Thanks a lot, @avehtari. Do you have a suggestion about which weak priors I should use?

Bjoern · January 17, 2018, 4:21pm

E.g. replace that with col=rgb(0,0,1,0.2).

I actually copy-pasted from a larger function I wrote for myself:

energyplot ← function(sfit, param=“logkappa”,chains=4, transp=0.2){
param_tmp ← as.data.frame(extract(sfit, permuted=FALSE))
samppars_tmp ← get_sampler_params(sfit, inc_warmup = FALSE)
epcols ← c(rgb(0,0,1,transp), rgb(1,0,0,transp), rgb(0,1,0,transp), rgb(1,1,0,transp), rgb(1,0,1,transp), rgb(0,1,1,transp) )
for (chain in 1:chains) {
if (chain==1) {
plot(param_tmp[,paste(“chain:”,chain,“.”,param,sep=“”)],
samppars_tmp[[chain]][,“energy__”],
col=epcols[chain], pch=16, xlab=param, ylab=“Energy”, cex.lab=1.5)
} else {
points( param_tmp[, paste(“chain:”,chain,“.”,param,sep=“”)],
samppars_tmp[[chain]][,“energy__”],
col=epcols[chain], pch=16)
}}
return()
}

claudiofronterre · January 17, 2018, 4:23pm

Ops, sorry, I copied the wrong version. I have updated it, it should be working now :)

avehtari · January 17, 2018, 5:07pm

Something with lighter tail than Cauchy. You could start by testing what happens with half-normals on nu and tau, just to see if adding priors make any difference.

Aki

ignacio · January 17, 2018, 6:05pm

I tried normal(0,10) and normal(0,1), but the problem does not go away.

avehtari · January 17, 2018, 6:15pm

Those are really wide compared to the original posterior. Right now I’m not worried that you would use too informative prior, now I just want to know what happens if you have informative prior. If that helps with the computation then you can start thinking harder about the priors or accept that sampling is going to be slow. Also you should think anyway whether that another smaller mode is plausible.

ignacio · January 17, 2018, 7:49pm

I tried many things, including the following:

iter <- 5000
warmup <- 2500

data {
  int<lower=0> J;             // number of practices 
  real y[J];                  // estimated  effect
  int<lower=0> n[J];          // number of Medicare beneficiaries 
}

parameters {
  real mu;                    // mean for practice
  real<lower=0> nu;           // 
  real<lower=0> tau;          // sd between practices
  vector[J] eta;
}

transformed parameters {
  vector[J] theta; 
  vector<lower = 0>[J] sigma;
  theta = mu + tau * eta;
  for (j in 1:J) sigma[j] = nu/sqrt(n[j]);
}

model {
  eta ~ normal(0, 1);
  nu ~ normal(2,0.5);
  tau ~ normal(0.1,0.01);
  y ~ normal(theta, sigma);
}

fit1 <- sampling(model_1, data = practices_dat, chains = 4,
                 show_messages = T, refresh = 500, cores=cores, 
                 control = list(adapt_delta = 0.99, stepsize = 0.01, max_treedepth = 18),
                 iter = iter, 
                 warmup = warmup) 

pairs_stan(chain=1, stan_model=fit1, pars=c("mu", "nu", "tau", "lp__"))

Capture

Alas, the error persists:

There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems

avehtari · January 17, 2018, 8:29pm

Looks better

It’s not an error. It’s a warning, and this one and the warning limit we don’t understand yet so well. Maybe @betanalpha could help?

andre.pfeuffer · January 18, 2018, 3:59pm

Try use a bivariate log-normal between tau and nu, specify
rho = - inv_logit(rho_raw)
rho_raw ~ normal(0, 2.0);

  real bilognormal_lpdf(vector x, vector mu, vector sig, real rho) {
    real z1 = (log(x[1]) - mu[1]) / sig[1];
    real z2 = (log(x[2]) - mu[2]) / sig[2];
    real invsquarerho = 1 - square(rho);

    return  exp( - (square(z1) + square(z2) - 2 * rho * z1 * z2) / (2.0 * invsquarerho))
     / (2.0 * pi() * x[1] * x[2] * sig[1] * sig[2] * sqrt(invsquarerho));
  }

betanalpha · January 18, 2018, 4:07pm

The E-BFMI (really should be E-FMI) quantifies how well the momentum resampling in HMC works. Low values indicate that the sampler is not able to explore all of the relevant energy sets fast enough which may manifest as bias in the exploration of the parameter space.

For manipulating the energy diagnostic take a look at https://github.com/betanalpha/knitr_case_studies/blob/master/fitting_the_cauchy/stan_utility.R. In particular, run check_energy to see exactly what the E-FMI values are – RStan warns for values below 0.3 but we have since seen that dynamic HMC works reasonably well for values as low as 0.2.

If you still see E-FMI values lower than 0.2 then you’ll have to consider reparameterizations/new priors. In particular, in many cases low E-FMI seems to indicate really poor identifiability of the likelihood which will require careful prior choice to ensure a well-behaved joint model.

ignacio · January 18, 2018, 4:27pm

@andre.pfeuffer could you show me how to modify my stan program to incorporate your solution? I have never done something like this before. Thakns!

andre.pfeuffer · January 18, 2018, 5:12pm

functions {
  real bilognormal_lpdf(vector x, vector mu, vector sig, real rho) {
    real z1 = (log(x[1]) - mu[1]) / sig[1];
    real z2 = (log(x[2]) - mu[2]) / sig[2];
    real invsquarerho = 1 - square(rho);

    return  exp( - (square(z1) + square(z2) - 2 * rho * z1 * z2) / (2.0 * invsquarerho))
     / (2.0 * pi() * x[1] * x[2] * sig[1] * sig[2] * sqrt(invsquarerho));
  }
}

data {
  int<lower=0> J;             // number of practices 
  real y[J];                  // estimated  effect
  int<lower=0> n[J];          // number of Medicare beneficiaries 
}

parameters {
  real mu;                    // mean for practice
  vector<lower=0>[2] nu_tau;
  real rho_raw;
  vector[J] eta;
}

transformed parameters {
  real<lower=-1, upper=0>rho = - inv_logit(rho_raw);
  real<lower=0> nu = nu_tau[1];
  real<lower=0> tau = nu_tau[2];     // sd between practices
  vector[J] theta; 
  vector<lower = 0>[J] sigma;
  theta = mu + tau * eta;
  for (j in 1:J) sigma[j] = nu/sqrt(n[j]);
}

model {
  eta ~ normal(0, 1);
  rho_raw ~ normal(0, 2);
  target += bilognormal_lpdf(nu_tau, [0.57, -2.3]', [0.2462207, 1.26]', rho);
  y ~ normal(theta, sigma); 
}

Calculations, regarding mu and sigma in bilognormal_lpdf (Hope I got it right)
based upon:

log(2) - 0.5^2/2
[1] 0.5681472
log(.1) - 0.05^2/2
[1] -2.303835
log(1+0.5^2/2^2)^0.5
[1] 0.2462207
log(1+0.1^2/0.05^2)^0.5
[1] 1.268636

ignacio · January 18, 2018, 5:23pm

Thanks @andre.pfeuffer . I’m getting this error when I try to compile the model:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

Probabilty functions with suffixes _lpdf, _lpmf, _lcdf, and _lccdf,
require a vertical bar (|) between the first two arguments.
  error in 'model27ebc53ae3cc1_stan_27ebc2609667' at line 39, column 36
  -------------------------------------------------
    37:   rho_raw ~ normal(0, 2);
    38:   // see (6), (7) in http://www.academicjournals.org/article/article1380633488_Yerel%20and%20Konuk.pdf
    39:   target += bilognormal_lpdf(nu_tau, [0.57, -2.3]', [0.2462207, 1.26]', rho);
                                           ^
    40:   y ~ normal(theta, sigma); 
  -------------------------------------------------

PARSER EXPECTED: "|"
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'stan-27ebc2609667' due to the above error.
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'stan-27ebc2609667' due to the above error.

All I did was copy and paste… :-(

Topic		Replies	Views
Bfmi-low problem with rstanarm after using parameters model_parameters rstanarm	0	537	March 18, 2022
How to address warnings about low ESS and low BFMI? Modeling	35	1170	February 14, 2024
Divergent transitions & BFMI low in a state-space model Modeling	6	1014	August 17, 2017
Issues with E-BFMI (missing doc, confusing name, etc.) Developers	17	758	June 14, 2021
Fixing "BFMI was low" problems Modeling fitting-issues , performance	11	3847	December 21, 2022

How can I solve "bfmi-low" problem?

Related topics