Hey all,
I have data mice were some molecule was measured in the blood over time. (after some challenge and after some treatment)
Very low concentrations of the molecule fall below the detection threshold and get reported as 0.
Some experimental groups in my dataset have very high concentration and no mice have zero levels, another group all mice have zero concentration and another group has a few zeros.
I struggle a bit on how to best this in brms.
I thought at firstabout using a hurdle model:

fit = brm(bf(conc ~ treatment * challenge+ (1 | mouse), hu ~ treatment * challenge+ (1 | mouse)) ,
data = data, family = hurdle_lognormal())
}

Then I thought about the fact that you can also think about this models as being left censored at zero. So I created an extra variable “censored” in my data frame that specifies “left” censored for every zero concentration and “none” for the rest.

fit = brm(bf(conc | cens(censored) ~ treatment * challenge+ (1 | mouse)) ,
data = data, family = hurdle_lognormal())
}

Now, this is the first time I have to handle this kind of data and I was wondering if any of these two models make any sense. If anybody can give some insights if this is all right or if I am making some obvious mistake, that would be great :)
If you can suggest a better alternative or know some good example or resource about this topic (in brms). That would also be welcome.
Thanks in advance!

In general this type of variable can be modelled as left-censored, with whatever error distribution you would otherwise use (so for some concentration, it’s probably going to be normal or log-normal).

A log-normal response would suggest that you can’t have them left-censored at zero. In this case I would want to know what the lower performance limit was for the analytic method, and use that as the censoring limit instead.

These models can have identifiability problems where there is a lot of censoring.

Thanks for your answer!
I have no information on the detection limit. I could maybe impute the the zeros with some arbitrary low number (.001) or the minimum non zero value in the dataset.
But I am not sure if this would make any difference depending on how these left censored hurdle models are implemented in brms.
I was also not entirely sure if I specified my model correctly in brms to have the desired effect (the second model in my original post)?

Well what would constitute ‘arbitrarily low’ would depend on the context, and this may be rather impactful depending on where the real reporting limit was, so that would take some care. Note that imputing with the lowest observed value would guarantee that your results would be biased. Most biological ‘concentration’ models will make most sense on log-scale, so you wouldn’t be able to have the censoring limit as zero.

The ‘hurdle’ and ‘censored’ concepts are distinct. A hurdle model is a joint model for a binomial and a continuous response (i.e. there are two simultaneous models), and the censored model (usually) integrates out the censored observations to represent everything with one model.

For the censored model you want something like

brm(bf(conc|cens(censored) ~ 1 + ...), family = 'gaussian()', ...)

Where censored is -1 for left-censored observations.