Divergences with logistic multilevel model with binary predictor

I’m trying to fit a multilevel logistic regression model, but encounter divergences. I suspect that something is wrong with my model specification, or that the data are simply not identifiable. Since I’m just working out the analysis design for an experiment, I’m currently purely working with simulated data.

The data will come from an experiment. The outcome variable is binary. The treatment variable (intervention) is also binary. The experiment will be run across a couple of sites with different sample sizes. If available, we might also incorporate further categorical data, but this currently not incorporated

The base of the model reads like this:

base_model <- brm(outcome ~ intervention + (1 + intervention|site),
                  family = brms::bernoulli())

I’m primarily interested in the overall treatment effect across all study sites. However, it would also be interesting to investigate individual slopes for the study sites, depending on the sample size we achieve. This is why I opted for varying slopes and intercepts.

To avoid the divergences, I have increasingly tightened the priors. The current one’s are this:

c(
	prior(normal(0, 1), class = b),
	prior(normal(0, .2), class = sd),
	prior(normal(-1, 1), class = Intercept),
	prior(lkj_corr_cholesky(4), class = L)
)

This has helped in part, but not in full. I reran the simulated draws from the “true” parameters and the subsequent analysis 200 times, and the intervals and estimates seem to be reasonably well calibrated (13/200 = 6.5% of 95% intervals don’t contain the true parameter value).

However I still encounter divergences, especially if simulating larger data.
Is there something grossly wrong with how I try to model the data? Or might there simply be insufficient data to identify such a model?

For reference, this is the data generating code:

library(dplyr)
library(purrr)
library(brms)
library(marginaleffects)

bernoulli <- function(n, p) {
  runif(n) < p
}

parameters <- tibble(
  site = LETTERS[1:5] %>% rep(each = 2),
  intervention = rep(c(TRUE, FALSE), 5)
) %>% 
  mutate(
    n = case_when(
      site == "A" ~ 30,
      site == "B" ~ 300,
      site == "C" ~ 50,
      site == "D" ~ 20,
      site == "E" ~ 100
    ),
    intercept = case_when(
      site == "A" ~ .1,
      site == "B" ~ .15,
      site == "C" ~ .01,
      site == "D" ~ .05,
      site == "E" ~ .1
    ),
    effect = case_when(
      site == "A" & intervention ~ 0,
      site == "B" & intervention ~ .1,
      site == "C" & intervention ~ .2,
      site == "D" & intervention ~ -.1,
      site == "E" & intervention ~ .1,
      TRUE ~ 0
    ),
    prob = intercept + effect)


# true effect
true_effect <- parameters %>% 
  filter(intervention) %>% 
  summarise(avg_effect = weighted.mean(effect, n)) %>% 
  pull(avg_effect)

The following seed seems to work fine with the given priors, but other instances produce divergences.

set.seed(0297346)
simulated_data <- parameters %>% 
  mutate(
    # the n corresponds to the total sample size per site, so we need to divide
    # by two to reach the sample size for each condition
    outcome = map2(n, prob, ~bernoulli(.x / 2, .y))) %>% 
  unnest(outcome)

fit1 <- brm(outcome ~ intervention + (1 + intervention|site),
            data = simulated_data, family = brms::bernoulli(),
            prior = c(
              prior(normal(0, 1), class = b),
              prior(normal(0, .5), class = sd),
              prior(normal(-1, 1), class = Intercept),
              prior(lkj_corr_cholesky(4), class = L)
            )
            cores = 4)

How many levels of site?

1 Like

It looks like you may only have 5 levels of site, and in addition, very different numbers of observations per site. See section 3 and section 4.2.3 of Hierarchical Modeling . When you have unbalanced likelihood functions where some levels have much more data than other levels, you can actually see divergences with both the centered and non-centered parameterizations. Neither will do the job well. The centered parameterization works best when you have uniformly strongly informative likelihood functions and the non-centered parameterization when you have uniformly weakly informative likelihood functions. In the scenario of severely unbalanced likelihood functions you may need to parameterize those levels of site that have a lot of data with centered parameterization and those with little data with non-centered parameterization. brms uses non-centered parameterization by default, as it is typically the better choice in practice. You can implement both centered and non-centered in Stan, as shown by Betancourt’s example in section 4.2.3.

1 Like

Thanks, this makes a lot of sense! My conclusions:

  • Either I manage to obtain a balanced sample across the sites, in which case the non-centred implementation of brms should work fine. If this is not possible, I should implement the model in Stan, employing a mixed parameterisation, as shown in the blog post.
  • Reading on into section 4.2.4, I might still run into trouble if I somehow end up with only, say, 3 or 4 sites, in which case I can either use more informative priors, or simply drop the multilevel aspect and run a normal logistic regression, since increasingly strong priors would essentially lead to more pooling, which is the same as having no hierarchical structure.

On the first point, I’ll experiment with a more balanced dataset, and with the mixed parameterisation for the current data. Since I have no experience with implementing models directly in Stan, I might end up posting again regarding the actual implementation of the mixed parameterisation.

1 Like

In practice, I have found that the non-centered parameterization works pretty well, even for unbalanced samples across the levels if it’s not too severe. It’s just that the small number of sites plus the extremely unbalanced nature of your sim may be giving you trouble.
Note that your prior should reflect domain expertise and not be used as a tuning knob to remove divergences. So whatever prior you use should reflect at the very least some sort of soft containment of absurd values based on domain expertise.

1 Like