Hello,
I have a working hierarchical model representing a factorial design for an environmental treatment effect. The model works fine, and provides good estimates. However, I made some simplifications in calculations regarding the data input to the model, that I believe actually could be incorporated into the model for a more complete inference. This part I am currently struggling with.
Situation:
Pre and post-treatment I measure total (T) and dissolved (D) concentrations of my compound, but I am actually interested in a third calculated quantity, particulate concentration (P) for my model. But due to measurement differences and how my lab reports the data the quantity P = T - D is not straightforward to calculate. The lab reports, if quantifiable concentrations, a measurement mean and a measurement error interval of approximately 2\sigma. If the concentrations are unquantifiable they report a limit of reporting (typically the case for some compounds post-treatment). So when I read the Stan manual for ideas on how to implement it all, it looks like I have some cross-over between a measurement-error model and left-censored data. However, neither seems to fit out of the box for my intent. I made some naive implementation, but it errors out on log(0) probability. I think I may be missing something fundamental.
Model code below, the particulate vector is what I would ideally want to incorporate in an expanded model including the experimental design. Any pointers on what I have misunderstood?
Any advice would be most welcome!
data {
int<lower = 0> N; // Number of experiments
// Data for total concentration
vector[N] total_mea; // Measured total concentration
vector[N] total_mea_err; // Uncertainty in total concentrations, 2*sigma
int total_bql[N]; // Below quantification limit: 0 or 1
vector[N] total_lor; // if total_bql == 1, this is the limit of reporting
// Data for dissolved concentration
vector[N] dissolved_mea; // Measured dissolved concentration
vector[N] dissolved_mea_err; // Uncertainty in dissolved concentrations, 2*sigma
int dissolved_bql[N]; // Below quantification limit: 0 or 1
vector[N] dissolved_lor; // if dissolved_bql == 1, this is the limit of reporting
}
parameters {
vector[N] total;
vector[N] dissolved;
// Constrain particulate to be >=0
vector<lower = 0>[N] particulate;
}
model {
for (i in 1:N) {
real total_err;
real dissolved_err;
if (total_bql[i]) {
// Sample total concentration is below quantification limit
// Assume it comes from Uniform[0, limit_of_reporting]
total[i] ~ uniform(0, total_lor[i]);
total_err = total_lor[i] / sqrt(12);
} else {
// Sample concentration was measured to mean with uncertainty 2xsigma
total[i] ~ normal(total_mea[i], total_mea_err[i]/2);
total_err = total_mea_err[i]/2;
}
if (dissolved_bql[i]) {
// Sample dissolved concentration is below quantification limit
// Assume it comes from Uniform[0, limit_of_reporting]
dissolved[i] ~ uniform(0, dissolved_lor[i]);
dissolved_err = dissolved_lor[i] / sqrt(12);
} else {
dissolved[i] ~ normal(dissolved_mea[i], dissolved_mea_err[i]/2);
dissolved_err = dissolved_mea_err[i]/2;
}
// Calculate the difference
// TODO:
// currently assumes two random normally distributed variables
// but could be normal - normal, normal - uniform, uniform - normal,
// depending on the assumption of the <LOR distribution
// perhaps create custom lpdf function later
particulate[i] ~ normal(total[i] - dissolved[i], sqrt(total_err^2 + dissolved_err^2));
}
}
Sample data:
compoundA <- list(N = 23L,
total_mea = c(1.16, 1.29, 7.17, 8.21, 2.62, 9.95, 1.89, 1.12, 7.28, 0.949, 0.892,
10.5, 12, 1.21, 1.22, 10.8, 1.1, 1.36, 2.95, 3.97, 0.5, 10.7, 1.16),
total_mea_err = c(0.17, 0.18, 0.73, 0.83, 0.29, 1, 0.23, 0.17, 0.74, 0.158, 0.154,
1.1, 1.2, 0.17, 0.18, 1.1, 0.17, 0.19, 0.32, 0.42, -1, 1.1, 0.17),
total_bql = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0),
total_lor = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5),
dissolved_mea = c(0.244, 0.415, 0.548, 0.5, 0.871, 0.547, 0.73, 1, 0.584, 0.447,
0.598, 1.16, 0.387, 0.767, 0.626, 0.27, 0.666, 0.5, 0.31, 0.305,
0.306, 0.392, 0.63),
dissolved_mea_err = c(0.027, 0.043, 0.056, -1, 0.088, 0.056, 0.074, 0.1, 0.06,
0.046, 0.061, 0.12, 0.04, 0.078, 0.064, 0.029, 0.068, -1,
0.033, 0.033, 0.033, 0.041, 0.064),
dissolved_bql = c(0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0),
dissolved_lor = c(0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05,
0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05))