Bad diagnostics with multilevel models

Hi there,
I’m new to Bayesian statistics and so I really appreciate the easy to use and very flexible brms package - thanks a lot!
Right now I’m trying to fit general logistic growth curves (cell population ratio in percentage) for about 500 patients with about 3-7 measurement points each. Single level/ no Pool fits do work fine, but the diagnostic values look bad for my multilevel models (Rhat >> 1.05, Bulk and Tail ESS << 10%, MSCE, MCMC plots don’t converge, divergent transitions over 99% …).
I already tried to

  1. increase maximum treedepth(18) and adapt_delta (0.999)
  2. create and vary initial value lists (see Salomon Kurz “Don’t forget your inits”)
  3. use more informative priors (especially with data subsets s. 5a)
  4. increase iter (up to 200.000 in each of four chains)
  5. data wrangling
    a) only including patients with a > 95% ratio at 30 days
    b) excluding data with decreasing points
    c) standardize, normalize
    d) divide values by max for each patient and setting upper asymptote to one (d = 1)
    → only to variable analysis
  6. work with simulated data (trying various numbers of individuals and measurements per patient)
  7. try different formulas (Weibull, Gompertz, …)

… but with all these procedures I still get bad diagnostics. Except for one data subset of 30 patients with very similar data structure (see 5 a) increasing to 100%, having > 4 measurements and an early turning point and high slope).
I want to get a robust multilevel model for as many patients as possible.

What am I missing/ what can I try to improve?
Am I stuck in the funnel of multilevel models and may it be helpful to try a centered model ?

Here are my simulated data … you can very the number of patients and individuals within 30 days.
In my real data frame patients have at least 3 measurements.


  d_bar <- 99.5 # mean value of upper asymptote
  e_bar <- 15   # -"- turning point
  b_bar <- -15  # -"- slope
  sigma_d <- 20 
  sigma_e <- 5
  sigma_b <- 5
n <- 30 # number of patients
m <- 10 # number of measurements

dsim <- data.frame(
## defining a "grid"/ numbers of individuals + sample days per UPN ----
    ID = patientID <- rep(1:n, each = m),
    days = days <- rep(seq(from = 0, to = 30, length.out = m), times = n), # defining range of Days
## defining hyperparameters (parameters of parameters) ----
    d = d <- rep(rtruncnorm(n, mean = d_bar, sd = sigma_d, a = 0, b = 100), each = m),
    e = e <- rep(rtruncnorm(n, mean = e_bar, sd = sigma_e, a = 0), each = m),
    b = b <- rep(rtruncnorm(n, mean = b_bar, sd = sigma_b, b = 0), each = m),
## cell growth definition ----,
    growth = cell_growth <- d / (1 + (days/ e)^b)

ggplot(dsim, aes(days, growth, group = ID, color = ID)) + 
  geom_point() + geom_line() 

And here is one default model with diagnostical problems, so do have all the other models as well (see above). I got the formula for the from the drc package’s paper by Ritz and Streibig 2015 “from additivity to synergism a modeling perspective” (formula 6 with c = lower asymptote = 0).


partPool_sim <- brm(bf(growth ~ d / (1 + (days/ e)^b),
                        d + b + e ~ (1|ID), 
                        nl = TRUE),
                    data = dsim,
                    prior = c(prior(normal(50, 25), nlpar = "d", ub = 100, lb = 0),
                              prior(normal(-10, 10), nlpar = "b", ub = 0),
                              prior(normal(15, 10), nlpar = "e", lb = 0),
                              prior(cauchy(0, 10), class = "sd", nlpar = "d", lb = 0),
                              prior(cauchy(0, 10), class = "sd", nlpar = "b", lb = 0),
                              prior(cauchy(0, 10), class = "sd", nlpar = "e", lb = 0),
                              prior(cauchy(0, 10), class = "sigma", lb = 0)),
                    warmup = 5000, iter = 10000,
                    chains = 4, cores = 4,
                    backend = "cmdstanr")


The population (mean of d, b and e) and group level effects (sd of d, b and e’s means) have Rhats around 4 and the inidivual fits sometimes over 1000 !
The estimates themselves are reasonable and suit the measurement points over time (see two added posterior predictive plots).

Thanks so much for your help!

Few things to consider:

  1. if this outcome is bounded, then the constant normal error model is highly problematic as it will definitely predict responses outside of the bounds. Particularly for those observations at or close too the bound, they will be essentially perfectly predicted (with a well-fitting model), though the constant normal error model must imply that the conditional distribution is partially outside the bound (about half of it, in fact).
  2. those priors seem extremely wide. Particularly cauchy(0,10) has tails that extend well beyond |60|.
  3. I generally find it more effective to transform rather than use bounded priors. For example, the positive-constrained parameter could have a normal prior but be log-transformed within the model. I’d say this is the typical approach for pharmacometrics.
  4. the upper asymptote being so close the boundary of the prior (100) might make it very difficult to sample. I’d consider carefully what the meaning of its upper boundary is. If the response variable is a percentage, then you may be better off using an absolute response variable instead.

Hi AWoodward,
thank you so much for your answer which was really helpful!
I’m sorry I didn’t answer you straightaway, but I got COVID directly after the creation of this topic and needed some time to recover…
Here are some more thoughts and questions to your points:

  1. Thanks for pointing out the prediction problems at the upper and lower asymptote with the normal error. I tried to improve these with another brm argument brm(bf(growth|trunc(lb = 0, ub = 100) ~ d/ 1 + (days/e)^b), …). This will help calculcating more reasonable outcome predictions, but the diagnostics stay the same.
    I guess this isn’t the solution you’d prefer - as I set boundaries again.
    Which error model would you suggest instead of a normal error and how would you implement this in brms (via the family() argument)? Would you use a constant or a varying error model (e.g. a varying sigma for each UPN)?
    Or should I log-transform the outcome (growth) and use a lognormal or gamma response?

  2. I tried different priors, e.g. normal(5, 2.5) and also with different data subsets (e.g. only > 95% at the last measurement), but still the converegnce diagnostic won’t work.

  3. I totally missed to log transform, thanks for this suggestion!
    Do you mean only log-tranforming the model’s parameters or also the outcome (growth)? Could you give an example for a model’s definition?

  4. I can’t use absolute response variables, because I only got the percentage values.

Thanks a lot again. I’m looking forward to try the log-transformation and the lognormal() models!

Hope you have recovered okay!

  1. I’d say that the beta distribution would be worth considering as it is bounded between 0 and 1. This may be feasible with a constant error model, and probably a distributional (varying error) model will be too difficult to fit in this case anyway. So try family = Beta(link = 'identity'). One issue you might find here is that the beta distribution is supported on (0,1), that is, it cannot take values of 0 or 1. So if you have response values of 0% or 100% in your data, these cannot be predicted. We discussed this issue a while ago here Non-linear models using family = Beta(); I believe @saudiwin has since published about the ordered beta model discussed there so you could explore that, or perhaps censoring of the response, depending on what suits your problem.

  2. I think that the priors are important for this case, but probably changing the priors alone won’t help that much.

  3. well, say if e is positively constrained, perhaps you could write your formula as brm(bf(growth ~ d / (1 + (days/ (exp(e)))^b); then your model is effectively for log(e), and its between-subject (or whatever the grouping is) distribution will be log-normal.

  4. you don’t really have any choice then! In my experience with data like these people seem to want to represent the observations as ‘proportion of subject’s maximum response’ or similar, which seems problematic to me. I suppose it makes more sense if the data represent a variable that is in fact a proportion, say the proportion of living cells in the sample from flow cytometry, or something like that.

1 Like