Bad diagnostics with multilevel models

Hi there,
I’m new to Bayesian statistics and so I really appreciate the easy to use and very flexible brms package - thanks a lot!
Right now I’m trying to fit general logistic growth curves (cell population ratio in percentage) for about 500 patients with about 3-7 measurement points each. Single level/ no Pool fits do work fine, but the diagnostic values look bad for my multilevel models (Rhat >> 1.05, Bulk and Tail ESS << 10%, MSCE, MCMC plots don’t converge, divergent transitions over 99% …).
I already tried to

  1. increase maximum treedepth(18) and adapt_delta (0.999)
  2. create and vary initial value lists (see Salomon Kurz “Don’t forget your inits”)
  3. use more informative priors (especially with data subsets s. 5a)
  4. increase iter (up to 200.000 in each of four chains)
  5. data wrangling
    a) only including patients with a > 95% ratio at 30 days
    b) excluding data with decreasing points
    c) standardize, normalize
    d) divide values by max for each patient and setting upper asymptote to one (d = 1)
    → only to variable analysis
  6. work with simulated data (trying various numbers of individuals and measurements per patient)
  7. try different formulas (Weibull, Gompertz, …)

… but with all these procedures I still get bad diagnostics. Except for one data subset of 30 patients with very similar data structure (see 5 a) increasing to 100%, having > 4 measurements and an early turning point and high slope).
I want to get a robust multilevel model for as many patients as possible.

What am I missing/ what can I try to improve?
Am I stuck in the funnel of multilevel models and may it be helpful to try a centered model ?

Here are my simulated data … you can very the number of patients and individuals within 30 days.
In my real data frame patients have at least 3 measurements.

library(tidyverse)
library(truncnorm)

  d_bar <- 99.5 # mean value of upper asymptote
  e_bar <- 15   # -"- turning point
  b_bar <- -15  # -"- slope
  sigma_d <- 20 
  sigma_e <- 5
  sigma_b <- 5
  
n <- 30 # number of patients
m <- 10 # number of measurements

dsim <- data.frame(
## defining a "grid"/ numbers of individuals + sample days per UPN ----
    ID = patientID <- rep(1:n, each = m),
    days = days <- rep(seq(from = 0, to = 30, length.out = m), times = n), # defining range of Days
## defining hyperparameters (parameters of parameters) ----
    d = d <- rep(rtruncnorm(n, mean = d_bar, sd = sigma_d, a = 0, b = 100), each = m),
    e = e <- rep(rtruncnorm(n, mean = e_bar, sd = sigma_e, a = 0), each = m),
    b = b <- rep(rtruncnorm(n, mean = b_bar, sd = sigma_b, b = 0), each = m),
## cell growth definition ----,
    growth = cell_growth <- d / (1 + (days/ e)^b)
  )

ggplot(dsim, aes(days, growth, group = ID, color = ID)) + 
  geom_point() + geom_line() 

And here is one default model with diagnostical problems, so do have all the other models as well (see above). I got the formula for the from the drc package’s paper by Ritz and Streibig 2015 “from additivity to synergism a modeling perspective” (formula 6 with c = lower asymptote = 0).

library(brms)
library(cmdstanr)

partPool_sim <- brm(bf(growth ~ d / (1 + (days/ e)^b),
                        d + b + e ~ (1|ID), 
                        nl = TRUE),
                    data = dsim,
                    prior = c(prior(normal(50, 25), nlpar = "d", ub = 100, lb = 0),
                              prior(normal(-10, 10), nlpar = "b", ub = 0),
                              prior(normal(15, 10), nlpar = "e", lb = 0),
                              prior(cauchy(0, 10), class = "sd", nlpar = "d", lb = 0),
                              prior(cauchy(0, 10), class = "sd", nlpar = "b", lb = 0),
                              prior(cauchy(0, 10), class = "sd", nlpar = "e", lb = 0),
                              prior(cauchy(0, 10), class = "sigma", lb = 0)),
                    warmup = 5000, iter = 10000,
                    chains = 4, cores = 4,
                    backend = "cmdstanr")

summary(partPool_sim)
partPool_sim$fit

The population (mean of d, b and e) and group level effects (sd of d, b and e’s means) have Rhats around 4 and the inidivual fits sometimes over 1000 !
The estimates themselves are reasonable and suit the measurement points over time (see two added posterior predictive plots).

Thanks so much for your help!

1 Like

Few things to consider:

  1. if this outcome is bounded, then the constant normal error model is highly problematic as it will definitely predict responses outside of the bounds. Particularly for those observations at or close too the bound, they will be essentially perfectly predicted (with a well-fitting model), though the constant normal error model must imply that the conditional distribution is partially outside the bound (about half of it, in fact).
  2. those priors seem extremely wide. Particularly cauchy(0,10) has tails that extend well beyond |60|.
  3. I generally find it more effective to transform rather than use bounded priors. For example, the positive-constrained parameter could have a normal prior but be log-transformed within the model. I’d say this is the typical approach for pharmacometrics.
  4. the upper asymptote being so close the boundary of the prior (100) might make it very difficult to sample. I’d consider carefully what the meaning of its upper boundary is. If the response variable is a percentage, then you may be better off using an absolute response variable instead.
2 Likes

Hi AWoodward,
thank you so much for your answer which was really helpful!
I’m sorry I didn’t answer you straightaway, but I got COVID directly after the creation of this topic and needed some time to recover…
Here are some more thoughts and questions to your points:

  1. Thanks for pointing out the prediction problems at the upper and lower asymptote with the normal error. I tried to improve these with another brm argument brm(bf(growth|trunc(lb = 0, ub = 100) ~ d/ 1 + (days/e)^b), …). This will help calculcating more reasonable outcome predictions, but the diagnostics stay the same.
    I guess this isn’t the solution you’d prefer - as I set boundaries again.
    Which error model would you suggest instead of a normal error and how would you implement this in brms (via the family() argument)? Would you use a constant or a varying error model (e.g. a varying sigma for each UPN)?
    Or should I log-transform the outcome (growth) and use a lognormal or gamma response?

  2. I tried different priors, e.g. normal(5, 2.5) and also with different data subsets (e.g. only > 95% at the last measurement), but still the converegnce diagnostic won’t work.

  3. I totally missed to log transform, thanks for this suggestion!
    Do you mean only log-tranforming the model’s parameters or also the outcome (growth)? Could you give an example for a model’s definition?

  4. I can’t use absolute response variables, because I only got the percentage values.

Thanks a lot again. I’m looking forward to try the log-transformation and the lognormal() models!

Hope you have recovered okay!

  1. I’d say that the beta distribution would be worth considering as it is bounded between 0 and 1. This may be feasible with a constant error model, and probably a distributional (varying error) model will be too difficult to fit in this case anyway. So try family = Beta(link = 'identity'). One issue you might find here is that the beta distribution is supported on (0,1), that is, it cannot take values of 0 or 1. So if you have response values of 0% or 100% in your data, these cannot be predicted. We discussed this issue a while ago here Non-linear models using family = Beta(); I believe @saudiwin has since published about the ordered beta model discussed there so you could explore that, or perhaps censoring of the response, depending on what suits your problem.

  2. I think that the priors are important for this case, but probably changing the priors alone won’t help that much.

  3. well, say if e is positively constrained, perhaps you could write your formula as brm(bf(growth ~ d / (1 + (days/ (exp(e)))^b); then your model is effectively for log(e), and its between-subject (or whatever the grouping is) distribution will be log-normal.

  4. you don’t really have any choice then! In my experience with data like these people seem to want to represent the observations as ‘proportion of subject’s maximum response’ or similar, which seems problematic to me. I suppose it makes more sense if the data represent a variable that is in fact a proportion, say the proportion of living cells in the sample from flow cytometry, or something like that.

1 Like

Hi AWoodward,
thanks so much again and I’m super sorry I answer so crazy late ! I hope it’s still ok, I post my solution like “it’s never too late.”
The beta regression models I worked were definetly more pretty and got better results for the parameter “d” or upper asymptote than my normal error models. But still the convergence wouldn’t work. As I was more familar with the normal error model I tried to change the formula within this again and thanks to your tip to define log(e) as the parameter I got good diagnostics for that one. Only parameter b still had bad diagnostics and so I changed the formula into Chimerismus ~ d/ (1 + exp(b*(DayAfterTx - exp(e)))), then I got good Rhat, and ESS values - the posterior predictive plots also suited the data quite well. I just wonder what the theoretical background is behin this. Is it the “funnel” effect, e.g. described by Michael Betancourt in “Hierarchical Models” ? So when I decrease the parameter space/ possible values the parameters can take, I reduce the funnel effect and thereby the MCMC don’t get stuck and can cover the whole parameter space in reasonable time.
Next thing I want to try is getting the beta-regression model run with the new formula.
Best regards and thanks a lot again !
Leo

PS: The link to Betancourt’s explanation of problems in modeling hierarchical models: Hierarchical Modeling

PPS: Here’s the model now compiling good, just for interest:

eq_new2 <- bf(Chimerismus ~ d/ (1 + exp(b*(DayAfterTx - exp(e)))),
          #equations for alpha and beta
          d + b + e ~ (1|UPN),
          nl = TRUE)

# update model ----
partPool_newformula_noOut_log_e <- 
  brm(eq_new2,
      data = Chim95NoOut1_update, 
      prior = c(prior(normal(80, 10), nlpar = "d", ub = 100, lb = 0),
                 prior(normal(-1, 0.5), nlpar = "b", ub = 0),
                 prior(normal(2.3, 1.6), nlpar = "e"),
                 prior(normal(10, 5), class = "sd", nlpar = "d"),
                 prior(normal(1.6, 0.8), class = "sd", nlpar = "e"), 
                 prior(normal(5, 2.5), class = "sd", nlpar = "b"),
                 prior(normal(0, 2.5), class = "sigma", lb = 0)),
      control=list(adapt_delta=0.999, max_treedepth = 20),
      chains = 4, cores = 4,
      threads = threading(4), 
      warmup = 5000, iter = 10000,
      backend = "cmdstanr", 
      seed = 123
      )

summary(partPool_newformula_data2_noOut_obj)

Family: gaussian
Links: mu = identity; sigma = identity
Formula: Chimerismus ~ d/(1 + exp(b * (DayAfterTx - exp(e))))
d ~ (1 | UPN)
b ~ (1 | UPN)
e ~ (1 | UPN)
Data: Chimu31_new_noOut_obj (Number of observations: 3130)
Draws: 4 chains, each with iter = 20000; warmup = 10000; thin = 1;
total post-warmup draws = 40000

Group-Level Effects:
~UPN (Number of levels: 503)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(d_Intercept) 1.31 0.13 1.05 1.57 1.00 9279 15450
sd(b_Intercept) 0.50 0.03 0.45 0.57 1.00 2425 6090
sd(e_Intercept) 0.45 0.02 0.42 0.48 1.00 1046 2958

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
d_Intercept 98.40 0.10 98.20 98.60 1.00 24543 27848
b_Intercept -1.06 0.04 -1.14 -1.00 1.00 5466 12519
e_Intercept 1.87 0.02 1.83 1.91 1.00 548 1395

Family Specific Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 3.43 0.05 3.33 3.54 1.00 18318 25449