Growth models with many groups

Hello,

I am interested in fitting growth models to some data with a variety of treatments/groups in the data. Right now my project has 15 groups (genotypes) but I am hoping to get some advice on general scalability. In the example below the smallData.fit model will run and give good results but the completeData.fit model will fail to initialize.

My initial values are confined to be positive by rgamma since I am using lognormal priors, I have also tried inits=0 and setting inits as a list of lists containing values for a, b, and c for each chain (idea taken from here), but the model has consistently failed to initialize when I have more than 2 groups.

Is there a set of guidelines on how many groups a brms model should be able to handle? Has anyone had a similar situation and found a solution besides running separate models on subsets of the data (that option does work but it seems like a poor workaround)?

library(brms)
library(tidyverse)

## Setup simulated data
growthSim <- function(x,a,b,c){ 
  a_r <- a+rnorm(1,mean = 0,sd=10)
  b_r <- b+rnorm(1,mean=0,sd=2)
  c_r <- c+rnorm(1,mean=0,sd=.035)
  return(a_r*exp(-b_r*exp(-c_r*x)))}
x <- 1:25
smallDf <- rbind(do.call(rbind,lapply(letters[1:2], function(L) do.call(rbind, lapply(1:20,function(i) data.frame("sample"=paste0("sample_",i),"treatment"=L,"time"=x,"y"=growthSim(x,200-(5*which(letters==L)),13-(.2*which(letters==L)),.2+(0.001*which(letters==L))),stringsAsFactors = F))))))

completeDf <- rbind(do.call(rbind,lapply(letters[1:15], function(L) do.call(rbind, lapply(1:20,function(i) data.frame("sample"=paste0("sample_",i),"treatment"=L,"time"=x,"y"=growthSim(x,200-(5*which(letters==L)),13-(.2*which(letters==L)),.2+(0.001*which(letters==L))),stringsAsFactors = F))))))

completeDf%>%
  ggplot(aes(time,y,group=interaction(treatment,sample)))+
  geom_line(aes(color=treatment), show.legend=F)

## Define Priors

prior1 <- prior(lognormal(log(130), .25),nlpar = "a") +
  prior(lognormal(log(12), .25), nlpar = "b") + 
  prior(lognormal(log(1.2), .25), nlpar = "c") + 
  prior(student_t(3,0,5), dpar="sigma") +
  prior(gamma(2,0.1), class="nu")

## Run models

smallData.fit <- brm(bf(y ~ a*exp(-b*exp(-c*time)), 
               sigma~s(time,by=treatment), 
               a + b + c ~ 0+treatment,
               autocor = ~arma(~time|sample:treatment,1,1),nl = TRUE),
            family = student, prior = prior1, data = smallDf, iter = 2000, 
            cores = 4, chains = 4, backend = "cmdstanr", 
            control = list(adapt_delta = 0.999,max_treedepth = 20),
            inits = function(){list(b_a=rgamma(2,1),b_b=rgamma(2,1),b_c=rgamma(2,1))})

completeData.fit <- brm(bf(y ~ a*exp(-b*exp(-c*time)), 
               sigma~s(time,by=treatment), 
               a + b + c ~ 0+treatment,
               autocor = ~arma(~time|sample:treatment,1,1),nl = TRUE),
            family = student, prior = prior1, data = completeDf, iter = 2000, 
            cores = 4, chains = 4, backend = "cmdstanr", 
            control = list(adapt_delta = 0.999,max_treedepth = 20),
            inits = function(){list(b_a=rgamma(2,1),b_b=rgamma(2,1),b_c=rgamma(2,1))})

Error messages from trying to make completeData.fit for all chains are:

Chain 4 Rejecting initial value:
Chain 4   Log probability evaluates to log(0), i.e. negative infinity.
Chain 4   Stan can't start sampling from this initial value.
Chain 4 Initialization between (-2, 2) failed after 100 attempts.
Chain 4  Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model.
Warning: Chain 4 finished unexpectedly!

Also this is my first post so if I should provide more information or I have tagged this topic incorrectly please let me know! Thank you!

So does your model run without the grouping/treatments? And what does the simple model’s diagnostics look like (are there errors or warnings?).

The model runs with only one genotype and I have used a lot of very similar models comparing two treatment groups and those work well. The amount of time the models take is pretty variable even with really similar/identical data.

Recently I started seeing these warnings when the Stan code compiles:

Warning in '/var/folders/t6/mhdppcf93w5_21y8nv0s7m_40000gn/T/RtmpODvZUa/model-d7f16a179603.stan', line 43, column 2: Declaration of arrays by placing brackets after a variable name is deprecated and will be removed in Stan 2.32.0. Instead use the array keyword before the type. This can be changed automatically using the auto-format flag to stanc

And this warning has been a constant when using lognormal priors.

Warning message: It appears as if you have specified a lower bounded prior on a parameter that has no natural lower bound. If this is really what you want, please specify argument 'lb' of 'set_prior' appropriately. Warning occurred for prior b_a ~ lognormal(log(130), 0.25) b_b ~ lognormal(log(12), 0.25) b_c ~ lognormal(log(1.2), 0.25)

Rhats are all 1 to 1.01 with the version of the model using smaller data that will initialize.

I went ahead and ran that example model again just now with 2 chains/cores (computer is doing other things at the moment), I’ll attach the full summary of that model at the end of this comment.

The completeData.fit model always fails to initialize. The only thing I have thought of is that there is some prior on an intercept term not shown by get_prior() and that part of the model is using default inits on that variable which then causes the failure. But I would have thought inits=0 would solve the problem if that were the case.

> summary(smallData.fit)
 Family: student 
  Links: mu = identity; sigma = log; nu = identity 
Formula: y ~ a * exp(-b * exp(-c * time)) 
         autocor ~ tructure(list(), class = "formula", .Environment = <environment>)
         sigma ~ s(time, by = treatment)
         a ~ 0 + treatment
         b ~ 0 + treatment
         c ~ 0 + treatment
   Data: smallDf (Number of observations: 1000) 
  Draws: 2 chains, each with iter = 1000; warmup = 0; thin = 1;
         total post-warmup draws = 2000

Smooth Terms: 
                             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sds(sigma_stimetreatmenta_1)     6.81      2.39     3.83    12.64 1.00      617
sds(sigma_stimetreatmentb_1)     5.30      1.84     2.90    10.10 1.00      630
                             Tail_ESS
sds(sigma_stimetreatmenta_1)     1061
sds(sigma_stimetreatmentb_1)     1116

Population-Level Effects: 
                         Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
sigma_Intercept              1.91      0.04     1.83     1.99 1.00     1799
a_treatmenta               185.62      3.44   178.97   192.58 1.00     1809
a_treatmentb               185.35      2.77   180.04   190.95 1.00     1725
b_treatmenta                13.20      0.29    12.66    13.82 1.00     1524
b_treatmentb                11.83      0.34    11.24    12.54 1.00     1481
c_treatmenta                 0.20      0.00     0.20     0.21 1.00     1462
c_treatmentb                 0.20      0.00     0.20     0.21 1.00     1329
sigma_stime:treatmenta_1    31.97      4.74    23.00    41.27 1.00     1431
sigma_stime:treatmentb_1    24.07      4.23    16.03    32.59 1.00     1065
                         Tail_ESS
sigma_Intercept              1527
a_treatmenta                 1306
a_treatmentb                 1495
b_treatmenta                 1225
b_treatmentb                 1406
c_treatmenta                 1182
c_treatmentb                 1357
sigma_stime:treatmenta_1     1547
sigma_stime:treatmentb_1     1133

Family Specific Parameters: 
   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
nu     4.80      0.76     3.58     6.53 1.00     1655     1420

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
1 Like

In a new version of brms I now have a new warning message which makes me think this could be related to the splines initialization.

Missing init values for the following parameters:
 - chain 1: Intercept_sigma, bs_sigma, zs_sigma_1_1, sds_sigma_1_1, zs_sigma_2_1, sds_sigma_2_1, zs_sigma_3_1, sds_sigma_3_1, nu
 - chain 2: Intercept_sigma, bs_sigma, zs_sigma_1_1, sds_sigma_1_1, zs_sigma_2_1, sds_sigma_2_1, zs_sigma_3_1, sds_sigma_3_1, nu
 - chain 3: Intercept_sigma, bs_sigma, zs_sigma_1_1, sds_sigma_1_1, zs_sigma_2_1, sds_sigma_2_1, zs_sigma_3_1, sds_sigma_3_1, nu
 - chain 4: Intercept_sigma, bs_sigma, zs_sigma_1_1, sds_sigma_1_1, zs_sigma_2_1, sds_sigma_2_1, zs_sigma_3_1, sds_sigma_3_1, nu

So I tried setting the inits for those variables explicitly with the list below but that did not change anything.

init = function(){list(b_a=rgamma(2,1),b_b=rgamma(2,1),b_c=rgamma(2,1), 
                                   Intercept_sigma=runif(2,1,3), bs_sigma=runif(2,1,3),
                                   zs_sigma_1_1=runif(2,1,3),sds_sigma_1_1=runif(2,1,3),
                                   zs_sigma_2_1=runif(2,1,3), sds_sigma_2_1=runif(2,1,3), nu=runif(2,1,3))}

I also stumbled into a comment on a post about change-point/piecewise models suggesting recoding categorical variables as dummy variables (although that was in response to a different error than I have had), I gave that a shot with a subset of the data and did not see any improvements.

dummyCompleteDf<-completeDf%>%
  mutate(sample2=sample, treatment2=treatment)%>%
  pivot_wider(names_from = treatment2, values_from=sample2, names_prefix = "trt_")%>%
  mutate(across(.cols = contains("trt_"), .fns = ~ifelse(is.na(.x),0,1)))
dummyData.fit <- brm(bf(y ~ a*exp(-b*exp(-c*time)), 
                              sigma~s(time,by=treatment), 
                              a + b + c ~ 0+trt_a+trt_b+trt_c,
                              autocor = ~arma(~time|sample:(treatment),1,1),
                           nl = TRUE),
                           family = student, prior = prior1, data = dummyCompleteDf%>%filter(treatment %in% c("a", "b", "c")), iter = 1000,  # I started by trying with just a few groups 
                           cores = 4, chains = 4, backend = "cmdstanr", 
                           control = list(adapt_delta = 0.999,max_treedepth = 20),
                           init = 0) # tried with variety of inits