Reparameterizing gamma distribution to avoid low BFMI

Hi folks,
I have a model that consistently returns low Bayesian Fraction of Missing Information warnings on both real and simulated data. My pairs plot suggests that the shape and rate parameters of the gamma distribution are contributing to the low BFMI; however, I’ve struggled to reparameterize the distribution in a way which resolves the issue. I have attempted the parameterization described here, which does seem to reduce the correlation between the parameters of the gamma distribution, but still produces a BFMI warning.

Would you have any advice on how to reparameterize the gamma distribution (or take other steps) to avoid the low BFMI warning?

Model code:

data {
  int<lower=0> n_obs_known;
  vector[n_obs_known] HAT_known;
  int<lower=0> n_obs_unknown;
  vector[n_obs_unknown] HAT_unknown;
}

parameters {
  real mu_bias;
  real<lower=0> sigma_error;
  real<lower=0, upper=1> flight_prior;
  real<lower=0> shape;
  real<lower=0> rate;
  real<lower=0> real_alt[n_obs_unknown];
}

transformed parameters {
  real unknown_q[n_obs_unknown, 2];
  
  for(i in 1:n_obs_unknown){ 
    // Marginalized discrete parameter (https://mc-stan.org/docs/stan-users-guide/latent-discrete.html)
    // Measuring the likelihood that a given location is in a ground or flight state
    unknown_q[i, 1] = normal_lpdf(HAT_unknown[i]| mu_bias, sigma_error) + log1m(flight_prior);
    unknown_q[i, 2] = normal_lpdf(HAT_unknown[i]| real_alt[i] + mu_bias, sigma_error) + log(flight_prior);
  }
}

model {
  // likelihood of known locations
  target += normal_lpdf(HAT_known| mu_bias, sigma_error); 
  
  // likelihood of unknown locations
  for (i in 1:n_obs_unknown){ 
    target += log_sum_exp(unknown_q[i, 1], unknown_q[i, 2]);
  }
  
  //describing the altitude distribution
  real_alt ~ gamma(shape, rate);
  
  //priors
  mu_bias ~ normal(0, 1); // can be negative
  sigma_error ~ uniform(0, 1); // cannot be negative
  flight_prior ~ beta(30.5, 60); // informative prior: mean = 0.33, sd = 0.05 
  shape ~ normal(0, 5) T[0,];
  rate ~ normal(0, 10) T[0,];
}

R code to generate the simulated data:

library(dplyr)
library(purrr)
library(rstan)

options(mc.cores = 4)

# Known ground locations
known_ground <- 13000
measurement_error <- 50/2200

known_ground_df <- tibble(HAT = rnorm(n=known_ground, mean=0, sd=measurement_error), HAT_index = rep(1, known_ground))

# Potential flight locations
unknown_flight <- 141 
unknown_ground <- 287 

shape <- 1.25
rate <- 7.82

unknown_flight_df <- tibble(HAT = rgamma(unknown_flight, shape = shape, rate = rate), HAT_index = rep(1, unknown_flight))

unknown_flight_df$HAT <- unknown_flight_df$HAT %>%
  map(.f = function(x){ # incorporate measurement error
    rnorm(n = 1, mean = x, sd = measurement_error)
  }) %>%
  unlist()

unknown_ground_df <- tibble(HAT = rnorm(n=unknown_ground, mean=0, sd=measurement_error), HAT_index = rep(0, unknown_ground))
unknown_df <- bind_rows(unknown_flight_df, unknown_ground_df)

# Compile and fit
model_compiled <- stan_model("stan_model.stan")

init <- function(){list(mu_bias = rnorm(1,0,0.2),
                        sigma_error = runif(1,0,0.2),
                        shape = runif(1,3,5),
                        rate = runif(1,5,10),
                        flight_prior = 0.33)}

fit <- sampling(model_compiled, data = list(n_obs_known = nrow(known_ground_df),
                                            HAT_known = known_ground_df$HAT,
                                            n_obs_unknown = nrow(unknown_df),
                                            HAT_unknown = unknown_df$HAT), 
                init = init,
                pars = c("mu_bias", "sigma_error", "shape", "rate", "flight_prior"), 
                iter = 15000, 
                chains = 4)

As I am the one who came up with this idea, I would love to know how using this reparameterization helps your situation.

In a nutshell, reparameterize the distribution in terms of its mean and its shape parameter by writing your own _lpdf function like was done in a reply to my link. In the short example there, this method had better sampling than the built in gamma distribution. Good luck!

HI, @LABerigan, and welcome to the Stan forums. Also, thanks for posting such a clear illustration of what’s going on with the pairs plot

Stan requires variables to have support over their declared constraint. Here, sigma_error is declared to be positive but the distribution assumes it’s in (0, 1). To fix the inconsistency, either (a) replace the prior with one defined over all positive numbers [recommended], or (b) declare sigma_error with upper=1.

Also, you don’t need the truncation modifiers T[0, ], because the parameters to the normal distribution being truncated are constants—the truncation just adds a constant log CCDF value, which is expensive to compute and not necessary for sampling.

Your real problem is here:

This is essentially a centered prior, which tends to induce bad geometry in hierarchical models. It might help to follow @JohnnyZoom4H’s reparameterization, but ideally you’d take it all the way to a non-centered parameterization.

The question is whether you really need this to be a gamma distribution? If you use something like lognormal, it’s much easier to use a non-centered parameterization.

parameters {
  real mu_alt;
  real<lower=0> sigma_alt;
  vector<offset=mu_alt, multiplier=sigma_alt>[n_obs_unknown] log_real_alt; 

transformed parameters {
  vector<lower=0>[n_obs_unknown]  real_alt = exp(log_real_alt);
  
model {
  log_real_alt ~ normal(mu_alt, sigma_alt);

The offset and multiplier converts to the non-centered parametrization, but you still have to do the manual exponentiation. You don’t need a Jacobian adjustment here because the prior is going on the base parameter, log_real_alt—the result is that real_alt ~ lognormal(mu_alt, sigma_alt);

This reparameterization is great, thank you! It didn’t fully solve the BFMI issue, but this does increase the effective sample size. This also allows me to set an informative prior for the mean, which could help to resolve uncertainty in other parts of the model.

Thank you for the explanation- still new to Stan, so the details about how to specify the model correctly are really helpful. A lognormal distribution should be a good fit to the shape of the data- I didn’t realize that it had better options for a non-centered parameterization than the gamma. I’ll give it a shot!

I should qualify that with “as far as I know”. There are a lot of people who are very clever with reparameterizations like @spinkney and @bgoodri who may know how to do this well.