Mixture growth model with non-constant sigma for qPCR data

My growth model estimates random intercepts & random gradients for each individual, longitudinally in time. Its current specification is working. I would now like to make the intercept term belong to a 2-component Normal mixture distribution.

1) Current growth model (working)

The response is y and the growth term is time. Censoring is also included. The ID1 term allows correlation between parameters. The |SUBJ term is the hierarchical bit of the model and accounts for repeated measures by individual SUBJ. The traditional constant sigma term (i.e. sigma ~ 1) is replaced with one depending on a (non-linear) exponential of the response using extra parameters sigmaA, sigmaB & sigmaC.

get_prior(data = dataset,
          family = gaussian(),
          formula = bf(y | cens(cen1) ~ popnintercept + time) +
            lf(popnintercept + time ~ 1 + (1|ID1|SUBJ)) +
            nlf(sigma ~ sigmaA * exp(sigmaB * y) + sigmaC) +
            lf(sigmaA + sigmaB + sigmaC ~ 1) + 
            set_nl(nl = TRUE) 
)

This produces the following terms which I’ve rewritten here to help my understanding. Note they don’t match the exact brms syntax: cor, popnintercept(b), popnintercept(sd), sigmaA(b), sigmaB(b), sigmaC(b), time(b), time(sd).

2) First attempt (successful)

I have managed to split the intercept into mu1 & mu2:

mix = mixture(gaussian, gaussian)
get_prior(data = dataset,
          family = mix,
          formula = bf(y | cens(cen1) ~ 1 + time +
                                    (1 + time|ID1|SUBJ))
) 

It now contains cor, sigma1, sigma2, theta, mu1(Intercept), mu1(sd), mu2(Intercept), mu2(sd), time_mu1(b), time_mu1(sd), time_mu2(b), time_mu2(sd).

3) Non-constant sigma1 & sigma2 (unsuccessful)

Building on 2) above:

get_prior(data = dataset,
          family = mix,
          formula = bf(y | cens(cen1) ~ 1 + time +
                                  (1 + time|ID1|SUBJ)) +
            nlf(sigma1 ~ sigmaA1 * exp(sigmaB1 *y) + sigmaC1) +
            lf(sigmaA1 + sigmaB1 + sigmaC1 ~ 1) +
            nlf(sigma2 ~ sigmaA2 * exp(sigmaB2 * y) + sigmaC2) +
            lf(sigmaA2 + sigmaB2 + sigmaC2 ~ 1) +
            set_nl(nl = TRUE)
) 

But it gave Error: The following variables can neither be found in 'data' nor in 'data2': 'ID1'.

Is it possible that brms can’t cater for this specification yet?

  • Operating System: 5.10.16.3-microsoft-standard-WSL2
  • brms Version: 2.18.0

Hello :)

I have no specific experience with this time of model, but something bugs me :

Is it normal that your sigmas (sd of the mixture components) depends upon your response variable (y)? I do not really understand what kind of result it could produce!

All the best,
Lucas

Hi @ldeschamps, a good question and here is some context. The response y is actually the C_t (or C_q) value obtained during a qPCR assay. It is widely accepted that the precision of the measurement gets worse exponentially as the C_t value increases: this explains why the sigma has been made to be response-dependent. This has been validated in previous models that used a variable sigma to give better fitting models (lower LOOIC).

My reasoning for then having two different sigma is because I believe there are two separate processes—one generating the qPCR readings from truly-infected people at lower C_t, and another generating (different) background noise at higher C_t, therefore I wanted to account for this by using sigma1 & sigma2.

Oh, thanks for the explainations! This is really interesting.

This remind me the mean-variance functions of glms and other non normal models. I guess, in your case the mean-variance function varies among datasets, and it would not be satisfying to work with a mixture of families with defined mean-variance function?

Sorry, I bring a lot of question but I don’t know a technical solution to your problem :(

All the best,
Lucas

I have tried with a function for sigma(y; sigma1, sigma2, sigma3) informed with exact values (not estimating the sigma1, sigma2, sigma3 parameters) from earlier qPCR protocol papers, but have found that it gives models that don’t fit as well as when I use learn sigma1, sigma2, sigma3 from the data.

I guess this isn’t surprising considering that those applying the qPCR protocol to their study will do it with different lab conditions and probably suboptimally to the reference lab.