Reparameterizing a non-centered hierarchical logistic multiple regression model to avoid divergences in HMC

I have the following large non-centered hierarchical logistic multiple linear regression model:

data {
  int<lower=0> N; // number of samples
  int<lower=0> numCities; // number of cities
  int<lower=0> numSources; // number of sources
  int<lower=0> numTypes; // number of types
  int<lower=0> numSeasons; // number of seasons
  int<lower=0> numMonths; // number of months
  int<lower=0> numYears; // number of years
  int<lower=0> numStatuses; // number of species statuses
  int<lower=0, upper=1> mislabelled[N]; // vector of mislabelled samples
  int<lower=1, upper=numCities> city[N]; // vector of cities
  int<lower=1, upper=numSources> source[N]; // vector of sources
  int<lower=1, upper=numTypes> type[N]; // vector of types
  int<lower=1, upper=numSeasons> season[N]; // vector of seasons
  int<lower=1, upper=numMonths> month[N]; // vector of months
  int<lower=1, upper=numYears> year[N]; // vector of years
  int<lower=1, upper=numStatuses> status[N]; // vector of species statuses
}

parameters {
  real beta0;
  vector[numCities] beta1_raw;
  vector[numSources] beta2_raw;
  vector[numTypes] beta3_raw;
  vector[numSeasons] beta4_raw;
  vector[numMonths] beta5_raw;
  vector[numYears] beta6_raw;
  vector[numStatuses] beta7_raw;

  // hyperparameters

  // city
  real mu_beta1;
  real<lower=0> sigma_beta1;

  // source
  real mu_beta2;
  real<lower=0> sigma_beta2;

  // type
  real mu_beta3;
  real<lower=0> sigma_beta3;

  // season
  real mu_beta4;
  real<lower=0> sigma_beta4;

  // month
  real mu_beta5;
  real<lower=0> sigma_beta5;

  // year
  real mu_beta6;
  real<lower=0> sigma_beta6;

  // status
  real mu_beta7;
  real<lower=0> sigma_beta7;
}

transformed parameters {
  vector[numCities] beta1 = mu_beta1 + sigma_beta1 * beta1_raw;
  vector[numSources] beta2 = mu_beta2 + sigma_beta2 * beta2_raw;
  vector[numTypes] beta3 = mu_beta3 + sigma_beta3 * beta3_raw;
  vector[numSeasons] beta4 = mu_beta4 + sigma_beta4 * beta4_raw;
  vector[numMonths] beta5 = mu_beta5 + sigma_beta5 * beta5_raw;
  vector[numYears] beta6 = mu_beta6 + sigma_beta6 * beta6_raw;
  vector[numStatuses] beta7 = mu_beta7 + sigma_beta7 * beta7_raw;
}

model {
  vector[N] eta;
  beta0 ~ normal(0, 1);
  mu_beta1 ~ normal(0, 5);
  sigma_beta1 ~ cauchy(0, 5);
  mu_beta2 ~ normal(0, 5);
  sigma_beta2 ~ cauchy(0, 5);
  mu_beta3 ~ normal(0, 5);
  sigma_beta3 ~ cauchy(0, 5);
  mu_beta4 ~ normal(0, 5);
  sigma_beta4 ~ cauchy(0, 5);
  mu_beta5 ~ normal(0, 5);
  sigma_beta5 ~ cauchy(0, 5);
  mu_beta6 ~ normal(0, 5);
  sigma_beta6 ~ cauchy(0, 5);
  mu_beta7 ~ normal(0, 5);
  sigma_beta7 ~ cauchy(0, 5);

  beta1_raw ~ normal(0, 1);
  beta2_raw ~ normal(0, 1);
  beta3_raw ~ normal(0, 1);
  beta4_raw ~ normal(0, 1);
  beta5_raw ~ normal(0, 1);
  beta6_raw ~ normal(0, 1);
  beta7_raw ~ normal(0, 1);

  for (i in 1:N) {
    eta[i] = beta0 + beta1[city[i]] + beta2[source[i]] + beta3[type[i]] + beta4[season[i]] + beta5[month[i]] + beta6[year[i]] + beta7[status[i]];
  }
  mislabelled ~ bernoulli_logit(eta);
}

However, when trying to fit it via stan()in R, I get a warning regarding a number of divergent transitions in the Markov chain. To try to address this, I increased both the number of iterations, as well as set adapt_delta = 0.99 to no avail. I also tried fiddling with the hyperparamters.

Here is my fake data:

N <- 2
numCities <- 25
numSources <- 3
numTypes <- 4
numSeasons <- 4
numMonths <- 12
numYears <- 2
numStatuses<- 5
mislabelled <- sample(0:1, N, replace = TRUE)
city <- sample(1:numCities, N, replace = TRUE)
source <- sample(1:numSources, N, replace = TRUE) 
type <- sample(1:numTypes, N, replace = TRUE) 
season <- sample(1:numSeasons, N, replace = TRUE)
month <- sample(1:numMonths, N, replace = TRUE)
year <- sample(1:numYears, N, replace = TRUE)
status <- sample(1:numStatuses, N, replace = TRUE)

(fit <- stan("mislabelled_hierarchical.stan", data = list(N = N, 
                                             numCities = numCities,
                                             numSources = numSources,
                                             numTypes = numTypes,
                                             numSeasons = numSeasons,
                                             numMonths = numMonths,
                                             numYears = numYears,
                                             numStatuses = numStatuses,
                                             mislabelled = mislabelled, 
                                             city = city, 
                                             source = source, 
                                             type = type, 
                                             season = season, 
                                             month = month, 
                                             year = year, 
                                             status = status),
             iter = 2000, control = list(adapt_delta = 0.99)))

Does anyone have any advice?

For your working example, is N really supposed to =2 ?

N = 2 is just to keep things small to efficiently examine the output of the fitted model. In reality N is much larger (say, 1000 or so).

The reason I ask, is because if I fit your example and simply change N <- 1000, then I don’t get any divergent transitions.
If I fit your N <- 2 data with your model and change your priors on the mu_beta’s to normal(0, 1.5) and the priors on the sigma_beta’s to normal(0, 1), then I also don’t get any divergent transitions.
Your priors are pretty broad, considering that they’re on the logit scale. Try some prior predictive checks to see this.
Note that there is a standard normal in Stan 19.1 Normal distribution | Stan Functions Reference if you want to use it for your standardized group-level effects.
Also, this model could be efficiently fit in brms with a few lines of code. You could even use the stancode() function to extract the Stan code from the brms model and modify it, if in your real model you add something that brms can’t do.
Hope that helps.

@jd_c Thanks, this helps! I’ll play around a bit with the predictive checks for both prior and posterior. Since this is fake data, the real data is expected to be much less well behaved, especially when I begin to add interaction terms. I will consider your answer to be the solution to my problem.

1 Like