Censored data with varying, known censoring points

I have a regression problem in which the response variable is sometimes left-censored at a known point, but the censoring point is different for each observation. I’m using rstan as the wrapper.

I have code, largely copied from section 4.3 of the User’s Guide, for a known, unique censoring point, below.

data {
  int<lower=0> N_obs;
  int<lower=0> N_cens;
  int<lower=0> K;
  matrix[N_obs, K] x_obs;
  matrix[N_cens, K] x_cens;
  real y_obs[N_obs];
  real L;
  real sigma_params[2];
}
parameters {
  real<upper=L> y_cens[N_cens];
  vector[K] beta;
  real<lower=0> sigma;
}
model {
  sigma ~ lognormal(sigma_params[1], sigma_params[2]);
  y_obs ~ normal(x_obs * beta, sigma);
  y_cens ~ normal(x_cens * beta, sigma);
}

I want to change the specification of the censoring point:

real L[N_cens];
...

Of course, that breaks the declaration of y_cens.

Is there a way to declare vector y_cens with a known upper limit for each element? I'm guessing it can be done in a loop, but too much of a noob to know how.

Any help will be appreciated.
1 Like

Sorry we took so long to respond.

First, you don’t actually need the varying bounds - the manual goes on to show a more efficient implementation where those parameters are integrated out. So you would need something like:

for(n in 1:N) {
  target += normal_lcdf(L[n] | mu[n], sigma);
}

If you actually need varying bounds (which I think is not the case) than the best way is to have the parameters unconstrained and do the constraining transforms yourself (the transforms are described at https://mc-stan.org/docs/2_23/reference-manual/lower-bound-transform-section.html)

Best of luck!

Hi @martinmodrak, thanks for this example.

I have noticed that this example confused me initially because the documentation is incorrect with regards to the probability under censoring on this page: 4.3 Censored data | Stan User’s Guide.

The issue is that the final equation in the first block should be 1-phi((U-\mu)/\sigma), not 1-phi((y-\mu)/\sigma).

The code is correct as written (on the page and in your sample code above). However, the statement “you don’t actually need the varying bounds” is incorrect or unclear, since the equation will depend both in the code and in the correct expression on the varying bounds.

1 Like

@Christopher_Campbell thanks very much for noticing and reporting. It took a while, but the issue should be fixed in the next docs release: Censored data documentation has incorrect math · Issue #445 · stan-dev/docs · GitHub

1 Like