How can I solve "bfmi-low" problem?


#4

I think your function may have a bug. When i try to use it says:

Error in data.frame(energy[, x], pars[, x, ]) : object 'x' not found

#5

Where do you define epcols?


#6

You have weak identifiability of nu and tau and bimodal posterior.

Add priors for nu and tau.

I’m worried about the second mode not explored well enough, but then it’s possible that you don’t want that mode.


#7

I just tried the following modification:

data {
  int<lower=0> J;             // number of practices 
  real y[J];                  // estimated  effect
  int<lower=0> n[J];          // number of Medicare beneficiaries 
}

parameters {
  real mu;                    // mean for practice
  real<lower=0> nu;           // 
  real<lower=0> tau;          // sd between practices
  vector[J] theta;
}

transformed parameters {
  vector<lower = 0>[J] sigma;
  for (j in 1:J) sigma[j] = nu/sqrt(n[j]);
}

model {
  theta ~ normal(mu, tau);
  y ~ normal(theta, sigma);
}

And I get the same warning:

There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems

@Bjoern could you elaborate a bit more on your weakly informative priors suggestion? What do you have in mind?


#8

Thanks a lot, @avehtari. Do you have a suggestion about which weak priors I should use?


#9

E.g. replace that with col=rgb(0,0,1,0.2).

I actually copy-pasted from a larger function I wrote for myself:

energyplot <- function(sfit, param=“logkappa”,chains=4, transp=0.2){
param_tmp <- as.data.frame(extract(sfit, permuted=FALSE))
samppars_tmp <- get_sampler_params(sfit, inc_warmup = FALSE)
epcols <- c(rgb(0,0,1,transp), rgb(1,0,0,transp), rgb(0,1,0,transp), rgb(1,1,0,transp), rgb(1,0,1,transp), rgb(0,1,1,transp) )
for (chain in 1:chains) {
if (chain==1) {
plot(param_tmp[,paste(“chain:”,chain,".",param,sep="")],
samppars_tmp[[chain]][,“energy__”],
col=epcols[chain], pch=16, xlab=param, ylab=“Energy”, cex.lab=1.5)
} else {
points( param_tmp[, paste(“chain:”,chain,".",param,sep="")],
samppars_tmp[[chain]][,“energy__”],
col=epcols[chain], pch=16)
}}
return()
}


#11

Ops, sorry, I copied the wrong version. I have updated it, it should be working now :)


#12

Something with lighter tail than Cauchy. You could start by testing what happens with half-normals on nu and tau, just to see if adding priors make any difference.

Aki


#13

I tried normal(0,10) and normal(0,1), but the problem does not go away.


#14

Those are really wide compared to the original posterior. Right now I’m not worried that you would use too informative prior, now I just want to know what happens if you have informative prior. If that helps with the computation then you can start thinking harder about the priors or accept that sampling is going to be slow. Also you should think anyway whether that another smaller mode is plausible.


#15

I tried many things, including the following:

iter <- 5000
warmup <- 2500
data {
  int<lower=0> J;             // number of practices 
  real y[J];                  // estimated  effect
  int<lower=0> n[J];          // number of Medicare beneficiaries 
}

parameters {
  real mu;                    // mean for practice
  real<lower=0> nu;           // 
  real<lower=0> tau;          // sd between practices
  vector[J] eta;
}

transformed parameters {
  vector[J] theta; 
  vector<lower = 0>[J] sigma;
  theta = mu + tau * eta;
  for (j in 1:J) sigma[j] = nu/sqrt(n[j]);
}

model {
  eta ~ normal(0, 1);
  nu ~ normal(2,0.5);
  tau ~ normal(0.1,0.01);
  y ~ normal(theta, sigma);
}
fit1 <- sampling(model_1, data = practices_dat, chains = 4,
                 show_messages = T, refresh = 500, cores=cores, 
                 control = list(adapt_delta = 0.99, stepsize = 0.01, max_treedepth = 18),
                 iter = iter, 
                 warmup = warmup) 

pairs_stan(chain=1, stan_model=fit1, pars=c("mu", "nu", "tau", "lp__"))

Capture

Alas, the error persists:

There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems

#16

Looks better

It’s not an error. It’s a warning, and this one and the warning limit we don’t understand yet so well. Maybe @betanalpha could help?


#17

Try use a bivariate log-normal between tau and nu, specify
rho = - inv_logit(rho_raw)
rho_raw ~ normal(0, 2.0);

  real bilognormal_lpdf(vector x, vector mu, vector sig, real rho) {
    real z1 = (log(x[1]) - mu[1]) / sig[1];
    real z2 = (log(x[2]) - mu[2]) / sig[2];
    real invsquarerho = 1 - square(rho);

    return  exp( - (square(z1) + square(z2) - 2 * rho * z1 * z2) / (2.0 * invsquarerho))
     / (2.0 * pi() * x[1] * x[2] * sig[1] * sig[2] * sqrt(invsquarerho));
  }

#18

The E-BFMI (really should be E-FMI) quantifies how well the momentum resampling in HMC works. Low values indicate that the sampler is not able to explore all of the relevant energy sets fast enough which may manifest as bias in the exploration of the parameter space.

For manipulating the energy diagnostic take a look at https://github.com/betanalpha/knitr_case_studies/blob/master/fitting_the_cauchy/stan_utility.R. In particular, run check_energy to see exactly what the E-FMI values are – RStan warns for values below 0.3 but we have since seen that dynamic HMC works reasonably well for values as low as 0.2.

If you still see E-FMI values lower than 0.2 then you’ll have to consider reparameterizations/new priors. In particular, in many cases low E-FMI seems to indicate really poor identifiability of the likelihood which will require careful prior choice to ensure a well-behaved joint model.


#19

@andre.pfeuffer could you show me how to modify my stan program to incorporate your solution? I have never done something like this before. Thakns!


#20
functions {
  real bilognormal_lpdf(vector x, vector mu, vector sig, real rho) {
    real z1 = (log(x[1]) - mu[1]) / sig[1];
    real z2 = (log(x[2]) - mu[2]) / sig[2];
    real invsquarerho = 1 - square(rho);

    return  exp( - (square(z1) + square(z2) - 2 * rho * z1 * z2) / (2.0 * invsquarerho))
     / (2.0 * pi() * x[1] * x[2] * sig[1] * sig[2] * sqrt(invsquarerho));
  }
}

data {
  int<lower=0> J;             // number of practices 
  real y[J];                  // estimated  effect
  int<lower=0> n[J];          // number of Medicare beneficiaries 
}

parameters {
  real mu;                    // mean for practice
  vector<lower=0>[2] nu_tau;
  real rho_raw;
  vector[J] eta;
}

transformed parameters {
  real<lower=-1, upper=0>rho = - inv_logit(rho_raw);
  real<lower=0> nu = nu_tau[1];
  real<lower=0> tau = nu_tau[2];     // sd between practices
  vector[J] theta; 
  vector<lower = 0>[J] sigma;
  theta = mu + tau * eta;
  for (j in 1:J) sigma[j] = nu/sqrt(n[j]);
}

model {
  eta ~ normal(0, 1);
  rho_raw ~ normal(0, 2);
  target += bilognormal_lpdf(nu_tau, [0.57, -2.3]', [0.2462207, 1.26]', rho);
  y ~ normal(theta, sigma); 
}

Calculations, regarding mu and sigma in bilognormal_lpdf (Hope I got it right)
based upon:

log(2) - 0.5^2/2
[1] 0.5681472
log(.1) - 0.05^2/2
[1] -2.303835
log(1+0.5^2/2^2)^0.5
[1] 0.2462207
log(1+0.1^2/0.05^2)^0.5
[1] 1.268636


#21

Thanks @andre.pfeuffer . I’m getting this error when I try to compile the model:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

Probabilty functions with suffixes _lpdf, _lpmf, _lcdf, and _lccdf,
require a vertical bar (|) between the first two arguments.
  error in 'model27ebc53ae3cc1_stan_27ebc2609667' at line 39, column 36
  -------------------------------------------------
    37:   rho_raw ~ normal(0, 2);
    38:   // see (6), (7) in http://www.academicjournals.org/article/article1380633488_Yerel%20and%20Konuk.pdf
    39:   target += bilognormal_lpdf(nu_tau, [0.57, -2.3]', [0.2462207, 1.26]', rho);
                                           ^
    40:   y ~ normal(theta, sigma); 
  -------------------------------------------------

PARSER EXPECTED: "|"
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'stan-27ebc2609667' due to the above error.
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'stan-27ebc2609667' due to the above error.

All I did was copy and paste… :-(


#22

should be:

target += bilognormal_lpdf(nu_tau | [0.57, -2.3]’, [0.2462207, 1.26]’, rho);

Sorry, it was the uncorrected model in my copy&paste, since I got problems with discourse
formatting, I re-pasted the “old” model. I checked again, should compile now. But its after midnight,
my time, time is already late for me.


#24

I’m working to extend my model with an AR(1) component. I also ran into the low energy issue. I’ve done some background reading on “energy” and also read Michael’s low energy guide.

I used @claudiofronterre’s pairs function above that focuses on “energy” with the mu hyperparameter and three of the ~ 20 or so group specific estimates. As an aside @claudiofronterre, somehow my extract function was masked in the global env, so I had to add rstan::extract.

Am I right to conclude this bimodality in mu_phi is a likely source of the issue?

Here’s the model for reference,

data {
  int<lower=1> J; // number of apps
  int<lower=1> N; // number of observations, the rows
  int K; // number of features, the columns
  int<lower=0> y[N]; // dependent variable index by the number of rows
  matrix[N, K] X; // trend covariates
  matrix[N, K] X2; // action covariates
  matrix[N, K] X3; // lagged y
  int IDs[N]; // apps
}
parameters {
  real mu_alpha_raw;
  real<lower=0> tau_alpha_raw; // hyper parameter for alpha variance
  // I split out the intercept from other paraters in the design matrix;
  vector[J] alpha; // the intercepts indexed by J apps
  vector[K] beta; // a feature parameter K in length
  vector[K] action_slope; // a feature parameter K in length
  vector<lower=0>[J] theta;
  real<lower=0> mu_theta;
  real<lower=0> tau_theta;
  real mu_action_slope_raw;
  real<lower=0> tau_action_slope_raw;
  real mu_beta_raw;
  real<lower=0> tau_beta_raw;
  real mu_phi;
  real tau_phi;
  vector[K] phi;
}
transformed parameters {
  real mu_action_slope = 0.5 + 0.2 * mu_action_slope_raw;
  real<lower=0> tau_action_slope = 0.6 + 0.05 * tau_action_slope_raw;
  real mu_beta =  (0.05 * mu_beta_raw) - 0.3;
  real<lower=0> tau_beta = 0.07 + 0.05 * tau_beta_raw;
}
model {
  real mu_alpha;
  real tau_alpha;

  mu_phi ~ normal(0, 0.25);
  tau_phi ~ normal(0.5, 0.5);
  phi ~ normal(mu_phi, tau_phi);

  mu_theta ~ normal(1, 5);
  tau_theta ~ normal(3.6, 4);
  theta ~ normal(mu_theta, tau_theta);

  mu_alpha = 0.56 + 0.2 * mu_alpha_raw;
  mu_alpha_raw ~ normal(0, 1);
  tau_alpha = 2.4 + 0.3 * tau_alpha_raw;
  tau_alpha_raw ~ normal(0, 1);
  alpha ~ normal(mu_alpha, tau_alpha);

  mu_beta_raw ~ normal(0, 1);
  tau_beta_raw ~ normal(0, 1);
  beta ~ normal(mu_beta, tau_beta);

  mu_action_slope_raw ~ normal(0, 1);
  tau_action_slope_raw ~ normal(0, 0.05);
  action_slope ~ normal(mu_action_slope, tau_action_slope);

  y ~ neg_binomial_2_log(alpha[IDs] + X * beta + X2 * action_slope + X3 * phi, theta[IDs]);
}

EDIT: perhaps it was naive to slap a simple AR(1) process into this model? It looks like the condition (?) of stationarity is not so simple for discrete distributions (another source). For those following along, here’s a previous post where Bob walks through a Poisson AR(1) model.


#25

The first thing I’d try here is non-centering phi.