I’m new to Bayesian statistics and so I really appreciate the easy to use and very flexible brms package - thanks a lot!
Right now I’m trying to fit general logistic growth curves (cell population ratio in percentage) for about 500 patients with about 3-7 measurement points each. Single level/ no Pool fits do work fine, but the diagnostic values look bad for my multilevel models (Rhat >> 1.05, Bulk and Tail ESS << 10%, MSCE, MCMC plots don’t converge, divergent transitions over 99% …).
I already tried to
- increase maximum treedepth(18) and adapt_delta (0.999)
- create and vary initial value lists (see Salomon Kurz “Don’t forget your inits”)
- use more informative priors (especially with data subsets s. 5a)
- increase iter (up to 200.000 in each of four chains)
- data wrangling
a) only including patients with a > 95% ratio at 30 days
b) excluding data with decreasing points
c) standardize, normalize
d) divide values by max for each patient and setting upper asymptote to one (d = 1)
→ only to variable analysis
- work with simulated data (trying various numbers of individuals and measurements per patient)
- try different formulas (Weibull, Gompertz, …)
… but with all these procedures I still get bad diagnostics. Except for one data subset of 30 patients with very similar data structure (see 5 a) increasing to 100%, having > 4 measurements and an early turning point and high slope).
I want to get a robust multilevel model for as many patients as possible.
What am I missing/ what can I try to improve?
Am I stuck in the funnel of multilevel models and may it be helpful to try a centered model ?
Here are my simulated data … you can very the number of patients and individuals within 30 days.
In my real data frame patients have at least 3 measurements.
library(tidyverse) library(truncnorm) d_bar <- 99.5 # mean value of upper asymptote e_bar <- 15 # -"- turning point b_bar <- -15 # -"- slope sigma_d <- 20 sigma_e <- 5 sigma_b <- 5 n <- 30 # number of patients m <- 10 # number of measurements dsim <- data.frame( ## defining a "grid"/ numbers of individuals + sample days per UPN ---- ID = patientID <- rep(1:n, each = m), days = days <- rep(seq(from = 0, to = 30, length.out = m), times = n), # defining range of Days ## defining hyperparameters (parameters of parameters) ---- d = d <- rep(rtruncnorm(n, mean = d_bar, sd = sigma_d, a = 0, b = 100), each = m), e = e <- rep(rtruncnorm(n, mean = e_bar, sd = sigma_e, a = 0), each = m), b = b <- rep(rtruncnorm(n, mean = b_bar, sd = sigma_b, b = 0), each = m), ## cell growth definition ----, growth = cell_growth <- d / (1 + (days/ e)^b) ) ggplot(dsim, aes(days, growth, group = ID, color = ID)) + geom_point() + geom_line()
And here is one default model with diagnostical problems, so do have all the other models as well (see above). I got the formula for the from the drc package’s paper by Ritz and Streibig 2015 “from additivity to synergism a modeling perspective” (formula 6 with c = lower asymptote = 0).
library(brms) library(cmdstanr) partPool_sim <- brm(bf(growth ~ d / (1 + (days/ e)^b), d + b + e ~ (1|ID), nl = TRUE), data = dsim, prior = c(prior(normal(50, 25), nlpar = "d", ub = 100, lb = 0), prior(normal(-10, 10), nlpar = "b", ub = 0), prior(normal(15, 10), nlpar = "e", lb = 0), prior(cauchy(0, 10), class = "sd", nlpar = "d", lb = 0), prior(cauchy(0, 10), class = "sd", nlpar = "b", lb = 0), prior(cauchy(0, 10), class = "sd", nlpar = "e", lb = 0), prior(cauchy(0, 10), class = "sigma", lb = 0)), warmup = 5000, iter = 10000, chains = 4, cores = 4, backend = "cmdstanr") summary(partPool_sim) partPool_sim$fit
The population (mean of d, b and e) and group level effects (sd of d, b and e’s means) have Rhats around 4 and the inidivual fits sometimes over 1000 !
The estimates themselves are reasonable and suit the measurement points over time (see two added posterior predictive plots).
Thanks so much for your help!
- Operating System: Windows 11Home (22H2)
- brms Version: 2.17.0