Censored regression via rstan

Harry_Southworth · May 7, 2020, 9:02am

Sorry for a bunch of newbie questions, but debugging takes a long time due to having to wait for everything to compile and run.

I want to implement an R package for regression with censored response data in a fairly general way, allowing Gaussian or t-distributed noise allowing the kurtosis parameter to be fixed or variable.

Using Section 4.3 of the Stan User’s Guide, I think I’ve got the first approach, Estimating Censored Values, working ok. I’m having a lot less fun with the second approach, Integrating out Censored Values (and it seems to me that that approach should work out better due to there being fewer things being estimated).

My Stan code, for left-censored response, is below. If I run the model, it takes forever (well, I kill the processes after several minutes). If I do the same with the Estimating Censored Values approach, it works just fine on the same data. So it’s not awkward data, it’s my code.

I’d be grateful if someone could point out to me what is likely wrong with it. The last line might be the place to start.

data {
  int<lower=0> N_obs;
  int<lower=0> N_cens;
  int<lower=0> K;
  matrix[N_obs, K] x_obs;
  matrix[N_cens, K] x_cens;
  real y_obs[N_obs];
  real L;

  real lognu_params[3];
  real sigma_params[2];
}
parameters {
  vector[K] beta;
  real<lower=0> lognu; // because if nu < 1, I won't believe it for the data I have
  real<lower=0> sigma;
}
model {
  sigma ~ lognormal(sigma_params[1], sigma_params[2]);
  lognu ~ student_t(lognu_params[1], lognu_params[2], lognu_params[3]);
  y_obs ~ student_t(exp(lognu), x_obs * beta, sigma);
  target += N_cens * student_t_lcdf(L | exp(lognu), x_cens * beta, sigma);
}

Other related questions:

Logically, if I want to fix nu (including setting it to Inf to allow Gaussian noise), I should specify its scale parameter to be 0, but that seems to cause it to fail. As such, I need to write largely duplicated code to cover fixed nu and infinite nu. Is that right or is there a way of passing parameters into the student_t functions that allow no variation and infinite nu? (I appreciate that the purpose of those functions is to allow variation, it’s a question about trying to avoid duplicating code.)

Also, if I want to allow right censoring I ought to be able to just multiply the response by -1 in my R wrapper, and then multiply the chains by -1. Is there a simple way of manipulating the chains like that, without causing issues with rstan::extract and so on further down the line?

Thanks in advance.

Harry_Southworth · May 7, 2020, 3:48pm

That line is wrong. I’m passing in the predictors x_cens but multiplying by the number of censored values as well. Presumably I want to get rid of the multiplier N_cens, and sum over the output of student_t_lcdf. Does anyone agree or disagree? Do I need a for loop to do that summation?

Thanks.

martinmodrak · May 12, 2020, 2:49pm

The _lccdf functions should be vectorized and thus return the sum of all the results if at least one of the parameters is a vector. Hence removing N_cens is likely what you are after.

Harry_Southworth · May 12, 2020, 3:49pm

Thanks. I’ve got a simplified version running and it appears to retrieve the data generating mechanism from simulated data.

Although it appears to work, it also prints a horrible error message to screen. I don’t know where it comes from, but could be R rather than Stan or rstan. Has this been seen before?

> Error in envRefInferField(x, what, getClass(class(x)), selfEnv) : 
>   no slot of name "fieldPrototypes" for this object of class "classRepresentation"
> Calls: <Anonymous> -> $ -> envRefInferField

Thanks again,
Harry

martinmodrak · May 12, 2020, 6:10pm

The error message is strange, but it was discussed here:

Topic		Replies	Views
Stan Users Guide - 4.3 Censored Data Modeling fitting-issues	0	302	August 23, 2023
Censored regression Modeling	2	464	August 8, 2022
Censored data with varying, known censoring points Modeling	3	674	November 29, 2021
Minimal censored Gamma with VERY bad performance General fitting-issues	27	1127	July 8, 2022
Integrating out Censored Values - Error following manual's example Modeling	4	642	October 31, 2018

Censored regression via rstan

Related topics