Censored data with varying, known censoring points

Harry_Southworth · June 2, 2020, 10:43am

I have a regression problem in which the response variable is sometimes left-censored at a known point, but the censoring point is different for each observation. I’m using rstan as the wrapper.

I have code, largely copied from section 4.3 of the User’s Guide, for a known, unique censoring point, below.

data {
  int<lower=0> N_obs;
  int<lower=0> N_cens;
  int<lower=0> K;
  matrix[N_obs, K] x_obs;
  matrix[N_cens, K] x_cens;
  real y_obs[N_obs];
  real L;
  real sigma_params[2];
}
parameters {
  real<upper=L> y_cens[N_cens];
  vector[K] beta;
  real<lower=0> sigma;
}
model {
  sigma ~ lognormal(sigma_params[1], sigma_params[2]);
  y_obs ~ normal(x_obs * beta, sigma);
  y_cens ~ normal(x_cens * beta, sigma);
}

I want to change the specification of the censoring point:

real L[N_cens];
...

Of course, that breaks the declaration of y_cens.

Is there a way to declare vector y_cens with a known upper limit for each element? I'm guessing it can be done in a loop, but too much of a noob to know how.

Any help will be appreciated.

martinmodrak · June 12, 2020, 8:19pm

Sorry we took so long to respond.

First, you don’t actually need the varying bounds - the manual goes on to show a more efficient implementation where those parameters are integrated out. So you would need something like:

for(n in 1:N) {
  target += normal_lcdf(L[n] | mu[n], sigma);
}

If you actually need varying bounds (which I think is not the case) than the best way is to have the parameters unconstrained and do the constraining transforms yourself (the transforms are described at https://mc-stan.org/docs/2_23/reference-manual/lower-bound-transform-section.html)

Best of luck!

Christopher_Campbell · October 7, 2021, 8:55pm

Hi @martinmodrak, thanks for this example.

I have noticed that this example confused me initially because the documentation is incorrect with regards to the probability under censoring on this page: 4.3 Censored data | Stan User’s Guide.

The issue is that the final equation in the first block should be 1-phi((U-\mu)/\sigma), not 1-phi((y-\mu)/\sigma).

The code is correct as written (on the page and in your sample code above). However, the statement “you don’t actually need the varying bounds” is incorrect or unclear, since the equation will depend both in the code and in the correct expression on the varying bounds.

martinmodrak · November 29, 2021, 5:18pm

@Christopher_Campbell thanks very much for noticing and reporting. It took a while, but the issue should be fixed in the next docs release: Censored data documentation has incorrect math · Issue #445 · stan-dev/docs · GitHub

Topic		Replies	Views
Varying upper bound without lower bound Modeling rstan , specification	1	568	November 7, 2020
Censored regression via rstan Modeling	4	1037	May 12, 2020
Censored data is not required in modelling Modeling	6	572	February 4, 2019
Censored regression Modeling	2	464	August 8, 2022
Clarification about definitions associated with "censored values" Modeling specification	2	365	July 18, 2023

Censored data with varying, known censoring points

Related topics