Two-part mixture model on left-censored biomarker data

I am trying to model blood biomarker data collected longitudinally in a sample of patients vs. controls. There are 4 sampling timepoints, with patient data collected either at timepoint 1 or 2 (I have considered combining them into one timepoint, but biologically speaking it is better to keep them separate). The biomarker concentrations are heavily left-censored, with a large proportion of values below assay detection limits (< LOD, or non-detects).

I believe the best modelling strategy is a two-part mixed-effects mixture model, with bernoulli probability for predicting detects vs. non-detects, and a lognormal truncated model for the rest. Please advise if this is wrong, but I originally thought to do a mixed-effects censored Tobit regression, but I found the proportion of non-detects significantly affected parameters too much.

Example code for one biomarker is below


  cyt <- "IL_7"
  resp_censobs  <- paste0(cyt, "_censobs") # All non-detects re-coded to LOD
  resp_detect   <- paste0(cyt, "_detect") # DETECTED? TRUE/FALSE
  lod <- LOD_vals[[cyt]]

  bf_det <- bf(
    as.formula(paste0(resp_detect, " ~ ", "Group_Mixed * timepoint + Sex 
+ Age_centred + (1 | p | Subj_ID)")),
    family = "bernoulli"
  )
  
  bf_pos <- bf(
    as.formula(paste0(resp_censobs, " | trunc(lb = ", lod, ") ~ Group_Mixed * 
timepoint + Sex + Age_centred + (1 | p | Subj_ID)")),
    family = "lognormal"
  )
  
  fit_hurdle <- brm(
    bf_det + bf_pos + set_rescor(FALSE),
    data = df,
    backend = "cmdstanr",
    chains = 4, cores = 8,
    prior = priors,
    iter = 4000,
    warmup = 1000,
    save_pars = save_pars(all = TRUE),
    control = list(adapt_delta = 0.999, max_treedepth = 15),
    init = 0
  )

Please let me know if this is the correct modelling strategy and if there’s any fixes you would recommend

Seems reasonable. The part I’d worry about most is (1 | p | Subj_ID) on the bernoulli side of the model. This is fine as long as it’s normal for at least some subjects to have both detections and non-detections within the same subject. Otherwise the variance of this random effect blows up and the model might be degenerate, or highly sensitive to the prior on the variance. I think I gather that it’s normal for subjects to be measured only 3 times. Again, as long as it tends to happen that at least some subject have both detections and nondetections, there shouldn’t be a big problem.

Thanks! It is normal for some subjects to switch between detects and non-detects, yes. I intended this to account for correlations between the two modelling choices. The estimated correlation seems reasonable in all models (around 0.5-0.6). I will admit I don’t necessarily understand exactly how that parameter works though. Would you recommend only putting the p for the positive part of the model?

I would include that estimated correlaton, yes. The |p| notation only does anything at all, in this particular formula, when it is included in both parts of the model. It says that the random effects of Subj_ID on the hurdle part and the log-normal part should be modeled as bivariate normal with the degree of correlation estimated from data. Otherwise they would be modeled as independent normals, or in other words as a bivariate normal with the population-level correlation fixed to zero.

Thanks! One more question: I would also like to test for interactions between age, sex and group/time. From what I have understood about Bayesian methods, it seems it is possible to include relevant interactions in the model and just leave them there. This is my first foray into Bayesian methods, so I appreciate the help. With frequentist methods, I am used to removing any non-significant terms, as leaving them in affects statistical power to detect other effects of interest. In this case, I would use a best-subsets regression to test for important interactions. But with Bayesian, is it ok to just include all interactions of interest and then assess importance based on confidence intervals and probability of positive/negative effect?

I wouldn’t recommend iteratively dropping non-significant predictors as a model selection procedure in either bayesian or frequentist applications.

If the goal is to develop a model that maximizes predictive power, then we can use cross-validation or approximations to it (PSIS-LOO in the Bayesian setting, which is probably better than AIC in the frequentist setting) to select a model that is expected to be the best at predicting unseen data from the same population.

If the goal is to develop a model in which covariates are interpretable causally, then controlling for potential confounding is fundamentally important. If the model with all the terms you think might be causally relevant cannot distinguish which effects are non-zero, then what it means is not that the effects that you recover by sequentially dropping nonsignificant terms are indeed the causal ones. Instead it means that the data don’t contain enough information to reliably identify which effects are causal and what their magnitudes are, given your mental model for everything that could potentially be important.

However, one important feature of Bayesian methods here is that they will guard against overfitting, by properly accounting for the full posterior uncertainty in the model with everything. So in that sense, yes you can toss in everything that you think is important, and do straightforward inference on the model output. (Whereas with frequentist tools, you might end up with a badly overfitted point estimate, with weak or variable assurances about whether the uncertainty in that estimate is being adequately captured, depending on the details of the model and the software.)

Yes, your last point is exactly what I had understood about Bayesian models, so that is great! My goal here is inference, so I am not interested in maximising predictive power really, insofar as it helps me find the most appropriate interpretation of the effects. I must say I am not sure what you mean by “interpreted causally”, as I certainly haven’t worked with any statistics that would allow for causal inference. Basically, I just want to investigate group differences (patients vs. controls) in my biomarkers across time, controlled for age and sex, and I want to know if there are any important interactions there. Would you still recommend cross-validation in that case?

Here’s the question to ask, in a bare-bones example.

Suppose y, a, and b are all quite correlated with each other. If you fit y ~ a you recover a strong significant association. If you fit y ~ b you recover a strong significant association. If you fit y ~ a + b the model knows that at least one of these associations should be strong, but it cannot confidently conclude that either one in particular is strong.

If you goal is prediction, you don’t care which one of a or b you drop. You pick one and you have a strongly predictive model.

If your goal is causal inference, then unless you have a priori grounds to prefer one causal story over the other, you definitely cannot rule-in either causal effect, and you would want to conclude that the effects of both a and b are uncertain, though at least one of them must be strong.

Is your inferential goal more like the predictive one, or more like the causal one? Would you be perfectly happy to grab either a or b at random, and showcase that you have a nice predictive model? Or do you really care about understanding what you can conclude about the effect of a independent of b (i.e. the effect of a while controlling for b) or vice versa?

The answer to that question determines how you might approach this problem.

Edit: for the second goal, you don’t have to literally believe that either a or b is causal–you just have to be interested in the question of “which of a or b is a better instrument for the actual causal process?” You have to care about whether your inference on a controls for potential confounding effects of b and vice versa.

For inferential purposes, we would want to keep both a and b, which we would do by combining them using PCA or by averaging their standardised values. So my goals are much more aligned with the first, not the second. In this case though, I am only testing for interactions, so I would prefer to drop anything non-significant for parsimony. Age and sex need to be in the models, but their interactions don’t necessarily need to be.

This sounds like you are more interested in prediction, and I’d recommend using cross-validation (e.g. PSIS-LOO) to compare models and select a good one.

Just to say it one final time: if the model with an interactions is less predictive than the model without, then in the predictive inference framework you can drop it. But if you actually do care about undertstanding effects of interest while controlling for this potential interaction, then you cannot just drop the interaction term simply because it degrades predictive quality or because it’s non-significant. If the data are able to constrain the interaction to be very close to zero, then it’s presence won’t impact the rest of inference anyway, and there’s no benefit to dropping it. If the data aren’t able to constrain it to be near zero, then the data are telling you that this interaction really might matter in an important way, even if the data cannot be confident that it really does matter or even of what direction its impact might be. In this case, dropping it would mean fixing it to zero by assumption, irrespective of whether or not the data are telling you that it’s likely to be near zero.

Hmm but I am actually interested in testing ALL the interactions. It sounds like you’re explaining exactly what my original question was getting at. I’m not really interested in generating the most predictive model, I want to know what the effects of all my predictors and their interactions are. In frequentist frameworks, adding too many redundant predictors reduces power to detect true effects, which is why it’s better to remove non-significant interactions/variables from your model. From what you’ve said, it seems I can just as easily leave them in in a Bayesian framework, and in fact I then have even more confidence to be able to say “this interaction is 0, based on my data”

In a bayesian framework you will never be able to say that something is zero, just that it’s small. Likewise in a frequentist framework, you don’t every really conclude something is zero, just that it’s non-significant.

Including lots of low quality predictors will degrade the Bayesian model’s predictive performance and widen its uncertainty intervals around the effects of interest, unless the data are actually able to constrain these low-quality effects to be close to zero. With too many such effects, the data will get overwhelmed and uncertainty intervals will blow up.

The problem is, in both a frequentist and bayesian setting, that if you actually think a priori that these effects might matter, then observing that they are non-significant or have really wide uncertainty intervals is not actually evidence that they don’t matter. It’s just evidence that you still don’t know whether they matter or not. The inflated posterior uncertainty, including the inflated uncertainty intervals around the effects of interest, actually is the correct posterior when neither you nor the data know enough to rule out these interaction effects. Observing that they’re non-significant in a frequentist setting doesn’t rule them out, and dropping them is saying “let’s suppose they’re zero even though I think they might matter and the data also think they might matter (the data just aren’t certain that they matter)”.

Got it! Seems to just be a theoretical choice then. I’ll see what I can do about cross validation