There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems
I tried increasing the number of iterations to 5000, the warmup to 3000, the stepsize to 0.01 the adapt_delta to 0.99, and the max_treedepth to 18. Alas, the problem persists. This is my model:
data {
int<lower=0> J; // number of practices
real y[J]; // estimated effect
int<lower=0> n[J]; // number of Medicare beneficiaries
}
parameters {
real mu; // mean for practice
real<lower=0> nu; //
real<lower=0> tau; // sd between practices
vector[J] eta;
}
transformed parameters {
vector[J] theta;
vector<lower = 0>[J] sigma;
theta = mu + tau * eta;
for (j in 1:J) sigma[j] = nu/sqrt(n[j]);
}
model {
eta ~ normal(0, 1);
y ~ normal(theta, sigma);
}
and this is the pairs plot:
I have a few questions:
What can I learn about my model from this pairs plot?
Do you have any suggestions about how to solve the bfmi-low problem?
Is this a problem that can be ignored? The documentation says “…chains likely did not explore the posterior distribution efficiently”. This to me sounds like the algorithm was slow (inefficient) but ultimately got to where it needed to get.
What is the energy__ parameter and how do I plot it?
Are you working via rstan in R? If so, I asked myself question 4 recently. The answer in R seems to be to use (assuming that your stanfit object is called “sfit”) something like the following:
The problem for solving a bfmi-low problem for me has usually been to somehow re-parameterize the model. Thus, have you also tried a centered parameterization? I.e. going directly for theta ~ normal(mu, tau); ? Or what if you assign some weakly informative priors to regularize the model a bit?
I believe Michael Betancourt commented on the severity of the diagnostic in a previous thread. He’s also got a paper on the topic on arXiv ([1604.00695] Diagnosing Suboptimal Cotangent Disintegrations in Hamiltonian Monte Carlo). After reading all this, I feel like it is a warning I should not ignore (but I’ve not run into a problem, yet, where I cannot get rid of it).
data {
int<lower=0> J; // number of practices
real y[J]; // estimated effect
int<lower=0> n[J]; // number of Medicare beneficiaries
}
parameters {
real mu; // mean for practice
real<lower=0> nu; //
real<lower=0> tau; // sd between practices
vector[J] theta;
}
transformed parameters {
vector<lower = 0>[J] sigma;
for (j in 1:J) sigma[j] = nu/sqrt(n[j]);
}
model {
theta ~ normal(mu, tau);
y ~ normal(theta, sigma);
}
And I get the same warning:
There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems
@Bjoern could you elaborate a bit more on your weakly informative priors suggestion? What do you have in mind?
Something with lighter tail than Cauchy. You could start by testing what happens with half-normals on nu and tau, just to see if adding priors make any difference.
Those are really wide compared to the original posterior. Right now I’m not worried that you would use too informative prior, now I just want to know what happens if you have informative prior. If that helps with the computation then you can start thinking harder about the priors or accept that sampling is going to be slow. Also you should think anyway whether that another smaller mode is plausible.
data {
int<lower=0> J; // number of practices
real y[J]; // estimated effect
int<lower=0> n[J]; // number of Medicare beneficiaries
}
parameters {
real mu; // mean for practice
real<lower=0> nu; //
real<lower=0> tau; // sd between practices
vector[J] eta;
}
transformed parameters {
vector[J] theta;
vector<lower = 0>[J] sigma;
theta = mu + tau * eta;
for (j in 1:J) sigma[j] = nu/sqrt(n[j]);
}
model {
eta ~ normal(0, 1);
nu ~ normal(2,0.5);
tau ~ normal(0.1,0.01);
y ~ normal(theta, sigma);
}
There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-lowExamine the pairs() plot to diagnose sampling problems
The E-BFMI (really should be E-FMI) quantifies how well the momentum resampling in HMC works. Low values indicate that the sampler is not able to explore all of the relevant energy sets fast enough which may manifest as bias in the exploration of the parameter space.
If you still see E-FMI values lower than 0.2 then you’ll have to consider reparameterizations/new priors. In particular, in many cases low E-FMI seems to indicate really poor identifiability of the likelihood which will require careful prior choice to ensure a well-behaved joint model.
Thanks @andre.pfeuffer . I’m getting this error when I try to compile the model:
SYNTAX ERROR, MESSAGE(S) FROM PARSER:
Probabilty functions with suffixes _lpdf, _lpmf, _lcdf, and _lccdf,
require a vertical bar (|) between the first two arguments.
error in 'model27ebc53ae3cc1_stan_27ebc2609667' at line 39, column 36
-------------------------------------------------
37: rho_raw ~ normal(0, 2);
38: // see (6), (7) in http://www.academicjournals.org/article/article1380633488_Yerel%20and%20Konuk.pdf
39: target += bilognormal_lpdf(nu_tau, [0.57, -2.3]', [0.2462207, 1.26]', rho);
^
40: y ~ normal(theta, sigma);
-------------------------------------------------
PARSER EXPECTED: "|"
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model 'stan-27ebc2609667' due to the above error.
Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model 'stan-27ebc2609667' due to the above error.