Avoiding "estimated Bayesian Fraction of Missing Information was low" warning by using a lognormal prior for standard deviation

Dear Stan forum members,

I’m trying to set up a simple intercept-only model with a lognormal likelihood and an adaptive prior for the varying a_an intercepts. The latter is done by adding a prior for sigma_an:

data{
  int<lower=1> N;
  int<lower=1> N_an;
  int<lower=1> N_st;
  int<lower=1> N_gr;
  int idx_an[N];
  int idx_st[N];
  int idx_gr[N];
  real<lower=0> Area_s[N];
}
parameters{
  real a;
  vector[N_an] a_an;
  vector[N_st] a_st;
  vector[N_gr] a_gr;
  real<lower=0> sigma_an;
  real<lower=0> sigma;
}
model{
  vector[N] mu;
  sigma ~ cauchy(0, 1 );
  sigma_an ~ lognormal(0, 1 ); //half-Cauchy, half-normal, exponential priors give bfmi-low warning
  a_an ~ normal( 0 , sigma_an );
  a_st ~ normal( 0 , 1 );
  a_gr ~ normal( 0 , 1 );
  a ~ normal( 1 , 1 );
   for (i in 1:N) {
    mu[i] = a + a_an[idx_an[i]] + a_st[idx_st[i]] + a_gr[idx_gr[i]];
  }
  Area_s ~ lognormal(mu, sigma);
}

If, for sigma_an, I use the recommended half-Cauchy, half-normal, half-t or exponential prior (with a variety of scale parameters) I always get a “estimated Bayesian Fraction of Missing Information was low” warning (iter = 5000 and warmup = 1000). This, I think, means that the MC’s could not sufficiently explore the posterior. A warning that should not be ignored, probably. The recommendations are 1) reparameterization, 2) more warm-up samples. In my model I can’t think of a way to reparameterize and more warm-up samples did not solve the problem either (i tried up to warmup = 3000).

the pairs plot for the model with a half-Cauchy prior looks like this (energy__ and sigma_an are clearly correlated):
pairs_hcauchy

Now when I use a lognormal(0,1) prior for sigma_an the model runs smoothly and the a_an parameters do get regularized. The number of effective parameters according to WAIC is much lower than for the model without an adaptive prior. I used the lognormal because it seemed that the problems were with close-to-zero values of sigma_an and I don’t expect sigma_an to be close to zero anyway. Therefore I used the lognormal(0,1) which has less density close to zero but a fairly heavy tail. The pairs plot looks like this:
pairs_lognormal

However the lognormal prior is not a recommended one, according to the prior choice recommendations and there is still some correlations between energy__ and sigma_an (although check_energy() returns "no pathological behavior). Since I’m fairly new to Stan and Bayesian stats (mainly self-thought, yikes) I am afraid of doing something that I should not be doing (besides meddling with Stan).

My questions are 1) is it “allowed” to use a lognormal prior for the standard deviation? 2) Is there a way I could reparameterize this model so a half-Cauchy/exponential prior might be applied without the bfmi-low warning? 3) Could I ignore the bfmi-low warning?

Thanks for reading,

Seb

2 Likes

I’m also curious about this, since for various reasons I’d rather all my base parameters on an unconstrained scale, I generally work with the log sd, which I guess would solve / help the estimation problems here. Is there any simulation work comparing the approaches? I’m aware of Gelman(2006), and while I ‘kind of’ agree that the probability of a zero sd shouldn’t be zero, I didn’t find the examples convincing and wonder if we are worrying unnecessarily about a tiny range near zero that is irrelevant for practical problems, and instead compromising our distributions / flexibility elsewhere.

I am responding for two reasons namely that I am sort of happy that someone else is struggling with the same problems and also to bump the post to the top of the tread for more visibility. I hope the moderators don’t mind.

1 Like