Censored regression via rstan

Sorry for a bunch of newbie questions, but debugging takes a long time due to having to wait for everything to compile and run.

I want to implement an R package for regression with censored response data in a fairly general way, allowing Gaussian or t-distributed noise allowing the kurtosis parameter to be fixed or variable.

Using Section 4.3 of the Stan User’s Guide, I think I’ve got the first approach, Estimating Censored Values, working ok. I’m having a lot less fun with the second approach, Integrating out Censored Values (and it seems to me that that approach should work out better due to there being fewer things being estimated).

My Stan code, for left-censored response, is below. If I run the model, it takes forever (well, I kill the processes after several minutes). If I do the same with the Estimating Censored Values approach, it works just fine on the same data. So it’s not awkward data, it’s my code.

I’d be grateful if someone could point out to me what is likely wrong with it. The last line might be the place to start.

data {
  int<lower=0> N_obs;
  int<lower=0> N_cens;
  int<lower=0> K;
  matrix[N_obs, K] x_obs;
  matrix[N_cens, K] x_cens;
  real y_obs[N_obs];
  real L;

  real lognu_params[3];
  real sigma_params[2];
parameters {
  vector[K] beta;
  real<lower=0> lognu; // because if nu < 1, I won't believe it for the data I have
  real<lower=0> sigma;
model {
  sigma ~ lognormal(sigma_params[1], sigma_params[2]);
  lognu ~ student_t(lognu_params[1], lognu_params[2], lognu_params[3]);
  y_obs ~ student_t(exp(lognu), x_obs * beta, sigma);
  target += N_cens * student_t_lcdf(L | exp(lognu), x_cens * beta, sigma);

Other related questions:

Logically, if I want to fix nu (including setting it to Inf to allow Gaussian noise), I should specify its scale parameter to be 0, but that seems to cause it to fail. As such, I need to write largely duplicated code to cover fixed nu and infinite nu. Is that right or is there a way of passing parameters into the student_t functions that allow no variation and infinite nu? (I appreciate that the purpose of those functions is to allow variation, it’s a question about trying to avoid duplicating code.)

Also, if I want to allow right censoring I ought to be able to just multiply the response by -1 in my R wrapper, and then multiply the chains by -1. Is there a simple way of manipulating the chains like that, without causing issues with rstan::extract and so on further down the line?

Thanks in advance.

That line is wrong. I’m passing in the predictors x_cens but multiplying by the number of censored values as well. Presumably I want to get rid of the multiplier N_cens, and sum over the output of student_t_lcdf. Does anyone agree or disagree? Do I need a for loop to do that summation?


The _lccdf functions should be vectorized and thus return the sum of all the results if at least one of the parameters is a vector. Hence removing N_cens is likely what you are after.

Thanks. I’ve got a simplified version running and it appears to retrieve the data generating mechanism from simulated data.

Although it appears to work, it also prints a horrible error message to screen. I don’t know where it comes from, but could be R rather than Stan or rstan. Has this been seen before?

> Error in envRefInferField(x, what, getClass(class(x)), selfEnv) : 
>   no slot of name "fieldPrototypes" for this object of class "classRepresentation"
> Calls: <Anonymous> -> $ -> envRefInferField

Thanks again,

The error message is strange, but it was discussed here: