I have a bit of a conceptual question that I’m thinking of before endeavoring on using the loo package. Briefly, I’m trying to figure out how to specify the point-wise log-likelihoods in the generated quantities block (to be analysed post-hoc with the
loo package in R) when my model includes data augmentation for censored observations. More details below.
I’m working with a dataset where a subset of observations are left-censored. More specifically, I (assume that I) know the observation is under some threshold concentration (a detection limit of a lab assay) but I don’t know where between zero and that threshold. My model does however predict a specific concentration. For computational efficiency, I use data augmentation to model left-censored observations, meaning that for each censored observation, I define a parameter
y_log_true in the parameter block that is bounded to be
<=log(threshold). Then, in the model block, I include a statement
y_log_true ~ normal(y_log_hat, sigma);, where
y_log_hat is the model-predicted concentration for that data-point and
sigma is the measurement error of the lab assay. Works like a charm and I find it is more computationally efficient than integrating out the observation with a CDF.
Now for calculating the point-wise log-likelihoods (to be used with
loo later on), for censored data points, should I calculate the log-likelihood as either:
- the log-probability of the observation being censored, given expectation
y_log_hatand the measurement error
sigma(using the CDF as one would do in the model block when integrating out censored observations), or
- the log-probability density equivalent to the sampling statement in the model block
y_log_true ~ normal(y_log_hat, sigma);(including normalising constants, of course)?