Sorry for a bunch of newbie questions, but debugging takes a long time due to having to wait for everything to compile and run.
I want to implement an R package for regression with censored response data in a fairly general way, allowing Gaussian or t-distributed noise allowing the kurtosis parameter to be fixed or variable.
Using Section 4.3 of the Stan User’s Guide, I think I’ve got the first approach, Estimating Censored Values, working ok. I’m having a lot less fun with the second approach, Integrating out Censored Values (and it seems to me that that approach should work out better due to there being fewer things being estimated).
My Stan code, for left-censored response, is below. If I run the model, it takes forever (well, I kill the processes after several minutes). If I do the same with the Estimating Censored Values approach, it works just fine on the same data. So it’s not awkward data, it’s my code.
I’d be grateful if someone could point out to me what is likely wrong with it. The last line might be the place to start.
data {
int<lower=0> N_obs;
int<lower=0> N_cens;
int<lower=0> K;
matrix[N_obs, K] x_obs;
matrix[N_cens, K] x_cens;
real y_obs[N_obs];
real L;
real lognu_params[3];
real sigma_params[2];
}
parameters {
vector[K] beta;
real<lower=0> lognu; // because if nu < 1, I won't believe it for the data I have
real<lower=0> sigma;
}
model {
sigma ~ lognormal(sigma_params[1], sigma_params[2]);
lognu ~ student_t(lognu_params[1], lognu_params[2], lognu_params[3]);
y_obs ~ student_t(exp(lognu), x_obs * beta, sigma);
target += N_cens * student_t_lcdf(L | exp(lognu), x_cens * beta, sigma);
}
Other related questions:
Logically, if I want to fix nu (including setting it to Inf to allow Gaussian noise), I should specify its scale parameter to be 0, but that seems to cause it to fail. As such, I need to write largely duplicated code to cover fixed nu and infinite nu. Is that right or is there a way of passing parameters into the student_t functions that allow no variation and infinite nu? (I appreciate that the purpose of those functions is to allow variation, it’s a question about trying to avoid duplicating code.)
Also, if I want to allow right censoring I ought to be able to just multiply the response by -1 in my R wrapper, and then multiply the chains by -1. Is there a simple way of manipulating the chains like that, without causing issues with rstan::extract and so on further down the line?
Thanks in advance.