Improve model on log-normal data with heterogenous variability


I’m working with data which are characterised by :

  • log-linear accross time : log(Y)=a+b*Day
  • Repeated measurements
  • Possibility of complete regression i.e. subjects with regressive tumor volume
  • Variability inter and intra groups

Here is a example of what the data could look like :

Here is the code :

nlprior <- c(prior(normal(0, 100),nlpar = "alpha"),
               prior(normal(0, 100),nlpar = "beta"),
               prior(gamma(0.01, 0.01), class=sd, nlpar = "alpha"),
               prior(gamma(0.01, 0.01), class=sd, nlpar = "beta"),
               prior(gamma(0.01, 0.01), class=sigma)
  nlform <- bf(log_value ~ alpha+beta*(day),
                 alpha ~(1|gr(Animal_ID, by=Group_no))+Group_no,
                 beta ~ (1|gr(AnimaI_ID, by=Group_no))+Group_no,
                 nl = TRUE)

fit <- brm(formula = nlform,
               data = data,
               sample_prior = "yes",
               family = gaussian(),
               chains = 4,
               cores= parallel::detectCores(),
               iter = 100000,
               thin = 5,
               seed = 42,
               control = list(adapt_delta = 0.95, max_treedepth = 15)

Here are my issues :

  • Is it possible to only put the gr(Animal_ID, by=Group_no) on beta and not alpha ? it’s not running if I only put on beta.
  • The code made several hours (even days) to compute, is it a way to improve that ?
  • I used hypothesis function to compare each group to the first one. Is it necessary to adjust for multiplicity just like in frequentist. If yes, how ? If no, why ?

I can’t provide the data but I hope it’s clear.
Thanks a lot

1 Like

one thing that feels weird is that I think a model equivalent to yours could be written directly in the normal (linear) brms syntax. Something like:

log_value ~ Group_no * day + gr(1 + day, Animal_ID, by = Group_no, cor = FALSE) 

should be equivalent or almost equivalent. Is there a reason for your use of the non-linear syntax? (e.g. if you plan to extend the model further and that would require non-linear terms).

I don’t see a reason why it should be a big problem in principle - what do you mean by “not running”? Do you get an error? Or is the model very slow? Or anything else?

Probably yes. You are using a lot of iterations - Stan generally needs much less iterations than say JAGS as it tends to explore more efficiently. You also have high adapt_delta and max_treedepth. If the model gives you warnings with lower number of iterations/adapt_delta/max_treedepth, it is a signal that there is something problematic with the model. Most likely you just don’t have enough data to learn anything useful about all of your parameters.

Additionally, we usually don’t recommend this type of gamma prior on the sd parameters as it tends to put too much prior weight on large sds. So unless you have a good reason to prefer the gamma prior, I would rather use something like normal(0, half_the_variability_you_roughly_expect).

Feel free to ask for further clarifications if anything is unclear.

Best of luck with your model!