I’m trying to understand how to use censored values in modeling with Stan. I have three questions that would be very helpful to have clarified.
- Right vs. left censoring. In the censored data part of the user guide, it states
Since Stan does not allow unknown values in its arrays or matrices, the censored values must be represented explicitly, as in the following right-censored case.
Then, it defines a set of y_cens values that are required to be larger than a previously defined value U
real<lower=U> y_cens[N_cens];
Since they are larger than a value on the left, I’m confused why these values are considered to be right censored. Could you please explain how left and right censoring are defined?
- I’m interested to understand how censoring is implemented by Stan. In a sampling statement such as the following, how does Stan assure that the sampled parameter values are, in this case, larger than U? Is some form of rejection sampling applied?:
model {
...
y_cens ~ normal(mu, sigma);
}
- Following the notation above, with U as the endpoint used for censoring, is it possible to define variable censors for an array of parameters? If it is possible, how would it be programmed in Stan? Something like the following:
...
transformed data {
...
array[3] real Us = {1.1,5.2,6.3}; //arbitrary values used for example... perhaps generated by some other function
...
}
...
transformed parameters {
...
array[3] real<lower=Us> t_O;
...
}
...
model {
t_O ~ <some_pdf>(...)
}
In my particular problem, I’ve got a set of observed data points, t, and a set of associated latent variables, t_O. Each t_{O_i} must be less than the corresponding t_i. How would I implement this?
Thank you!