Clarification about definitions associated with "censored values"

twistedVine · July 17, 2023, 3:04pm

I’m trying to understand how to use censored values in modeling with Stan. I have three questions that would be very helpful to have clarified.

Right vs. left censoring. In the censored data part of the user guide, it states

Since Stan does not allow unknown values in its arrays or matrices, the censored values must be represented explicitly, as in the following right-censored case.

Then, it defines a set of y_cens values that are required to be larger than a previously defined value U

real<lower=U> y_cens[N_cens];

Since they are larger than a value on the left, I’m confused why these values are considered to be right censored. Could you please explain how left and right censoring are defined?

I’m interested to understand how censoring is implemented by Stan. In a sampling statement such as the following, how does Stan assure that the sampled parameter values are, in this case, larger than U? Is some form of rejection sampling applied?:

model {
...
 y_cens ~ normal(mu, sigma);
}

Following the notation above, with U as the endpoint used for censoring, is it possible to define variable censors for an array of parameters? If it is possible, how would it be programmed in Stan? Something like the following:

...
transformed data {
...
array[3] real Us = {1.1,5.2,6.3}; //arbitrary values used for example... perhaps generated by some other function
...
}
...
transformed parameters {
...
array[3] real<lower=Us> t_O;
...
}
...
model {
t_O ~ <some_pdf>(...)
}

In my particular problem, I’ve got a set of observed data points, t, and a set of associated latent variables, t_O. Each t_{O_i} must be less than the corresponding t_i. How would I implement this?

Thank you!

jsocolar · July 17, 2023, 5:53pm

Right-censoring is when the right-hand tail of the population is censored. y_cens are these censored values. So y_cens should be larger than some value.

Sampling statements don’t actually cause Stan to draw a random sample from the distribution on the right hand side, they just increment the target density by the appropriate lpdf. Stan assures that the sampled parameter values are larger than U by constructing an unconstrained parameter under-the-hood, and taking y_cens to be U plus the exponential of this unconstrained parameter, and then adding the relevant Jacobian adjustment to the target. The code tells Stan to do this in the parameters block where y_cens is declared with a lower-bound constraint.

Recent versions of Stan (but maybe not 2.21, which is still the Rstan version on CRAN?) allow you to pass vectors of bounds to <lower>. However, you need to declare the relevant parameter in the parameters block, not in the transformed_parameters block.

twistedVine · July 18, 2023, 1:02am

Thank you! This is really interesting and helps clarify things. Good to know that more recent versions of Stan accept vectors as constraints. This should help a lot!

Topic		Replies	Views
Censored data is not required in modelling Modeling	6	572	February 4, 2019
Censored regression Modeling	2	463	August 8, 2022
Stan Users Guide - 4.3 Censored Data Modeling fitting-issues	0	299	August 23, 2023
The correct terminology for constrained parameters General	2	155	April 29, 2024
Integrating out censored data in negative binomial model Modeling specification	8	1399	August 1, 2018

Clarification about definitions associated with "censored values"

Related topics