Divergent transitions in a hierarchical lognormal regression model

I am modelling a dataset from a neuroscience experiment. When I’m trying to incorporate hierarchy in the model, I’m getting a few divergent transitions after warmup, for a variety of distribution families and model definitions. I would appreciate your advice on this; bellow I provide more information about my data and what I tried.

Experimental mice (N=4; total of ~11K trials) from a specific genetic strain were allowed to interact with mice from different genetic strains (CueID: a factor variable with six levels), and the dependent measure is the time they spent interacting (DwellTime: a strictly positive and heavily skewed variable). This first figure demonstrates that each mouse has a noticeable number of extreme observations:

This second figure summarizes how DwellTime looks in different conditions for each of the four mice.

My first step was to fit a regression model that predicts DwellTime using the CueID factor variable without any animal-specific terms:

post_DT_lognormal <- brm(formula = DwellTime ~ CueID,
                     data = big_dframe, family = lognormal(), warmup = 2000,
                     iter = 3000, chains = 4, control = list(adapt_delta = 0.98))

The diagnostics were fine and it seemed like the lognormal distribution did a good job; I tried also weibull(), Gamma(), gen_extreme_value(), frechet(), and exgaussian() families, but lognormal() mixed significantly fastest and had better loo log-likelihood.

When I introduced the next level of complexity, i.e., animal-specific intercepts:

post_DT_lognormal_mixeff_int_only <- brm(formula = DwellTime ~ CueID + (1 | AnimalID),
                                         data = big_dframe, family = lognormal(), warmup = 2000,
                                         iter = 3000, chains = 4, control = list(adapt_delta = 0.99, max_treedepth = 15))

I encountered a few divergent transitions after warmup (<10 across different attempts). Note that the model with the animal-specific intercepts had a better loo log-likelihood as compared to a population-effects only model.

                                                         LOOIC     SE
post_DT_lognormal_mixeff_int_only                     69595.29 312.75
post_DT_lognormal                                     70063.17 304.59
post_DT_lognormal_mixeff_int_only - post_DT_lognormal  -467.88  45.17

When looking at pairs(post_DT_lognormal_mixeff_int_only) () we see that the sd_AnimalID_Intercept term introduces a complex geometry with the other parameters (second column from right). If I understand correctly, brms implements a non-centered parametrization so this is probably not the problem.

And this isn’t solved when I try the other families that I mentioned above; It persists even when the model is in the simplified hierarchical form: formula = DwellTime ~ 1 + (1 | AnimalID), and moreover, when I trimmed outlier observations as well (and we wouldn’t like to throw these observations in any case).
Note also that when I log the data, it doesn’t look like a classic normal distribution, so not sure whether the log-normal distribution is the best here.

Similarly and as expected, when I introduce even more complexity and add animal-specific CueID coefficients:

post_DT_lognormal_mixeff <- brm(formula = DwellTime ~ CueID + (CueID || AnimalID),
                                         data = big_dframe, family = lognormal(), warmup = 2000,
                                         iter = 3000, chains = 4, control = list(adapt_delta = 0.99, max_treedepth = 15))

I get 3 divergent transitions after warmup and similar geometric issues. Again, the model does a better job than the varying-intercepts model and ideally I would prefer it.

                                                                LOOIC     SE
post_DT_lognormal_mixeff                                     69460.41 313.12
post_DT_lognormal_mixeff_int_only                            69595.29 312.75
post_DT_lognormal_mixeff - post_DT_lognormal_mixeff_int_only  -134.88  22.25

Do you have any suggestions?

  • Operating System: macOS Mojave 10.14.2,
  • brms Version: 2.01

My quick guess is that because you only have four animals, estimating the SDs of the animal-specific parameters is difficult. Perhaps you could try with a more informative prior on the SDs (I can’t recommend a specific one, but some people like to use the student_t with a low DF.) You can use the get_prior() function to help see how to specify the priors.

1 Like

Dan mentions <10 divergent transitions occurred after warmup, how many is acceptable? How does having divergent transitions impact the accuracy of the results?

1 Like

I think a little prior information will go a long way. By default, you’re allowing dwell times of up 8 x 10^67 as plausible. You could tighten this down to something like weakly informative 3 hours (~10000 seconds). (I’m assuming dwell times are in seconds.) Then apply this same principle to the random effects too.


data <- tibble(
  DwellTime = exp(rnorm(40, 1.5, 1)),
  AnimalID = rep(1:4, 10)

f <- bf(DwellTime ~ 1 + (1 | AnimalID))
get_prior(f, data = data)
#>                 prior     class      coef    group resp dpar nlpar bound
#> 1 student_t(3, 5, 10) Intercept                                         
#> 2 student_t(3, 0, 10)        sd                                         
#> 3                            sd           AnimalID                      
#> 4                            sd Intercept AnimalID                      
#> 5 student_t(3, 0, 10)     sigma

# Sample of population means on outcome scale
hist(exp(rstudent_t(1000, 3, 5, 10)))

Created on 2019-02-26 by the reprex package (v0.2.1)

I would recommend that you sample from the prior, plot simulated data from the prior, and see what kinds of values are being generated. Is everything under some more reasonable extreme? You do not want to tweak the priors until you get your data back. But you do want dwell times that are shorter than a day in length, I assume.

This is how you sample from the prior in brms.

pd <- brm(
  formula = f,
  sample_prior = "only",
  family = lognormal())

Thanks for your comments @tjmahr and @matti, they were helpful.
It seems to work with no divergent transitions and high n_eff when I define more specific priors, maybe because the shape of my distribution is unique - i.e., it is like a spike with very long yet flat tails, and the population-level difference between the conditions falls majorly within the spike (on the order of hundreds of milliseconds; so do the group-level differences within each condition). I had to restrict the priors to make to model run properly. The prior on "sd" had to be restricted to have a sigma of .1 (very small, isn’t it?), otherwise it would produce divergent transitions. On the exponential scale it still seems big enough.

prior<- c(set_prior("student_t(3, 1.54, 1)", class = "Intercept"),
            set_prior("student_t(3, 0, 1)", class = "b"),
            set_prior("student_t(3,0, .1)", class = "sd"))