I’m trying to fit left-censored data with many zeros. My plan was to use a log-normal hurdle model, and in the log-normal process, I was hoping I could impute the censored observations following the manual here. No luck however in recovering the right parameters as I’m not sure how to index the hurdle part in the model block.
To test out my model, I generated positive values from a lognormal distribution. A portion of these were then removed at probability = pZero
, i.e. turned into zeros. Then I censored all present values below a threshold (L
).
# simulate data
N <- 1000
mu <- 0
sigma <- 1
y_raw <- rlnorm(N, mu, sd = sigma)
dens(y_raw)
# add in other processes
L <- 1 #censor cutoff--below this value, shows up as zero.
pZero <- .5 #probability of actually being absent.
present <- rbinom(N, 1, pZero)
y_obs <- y_raw[y_raw > L & present == 1]
# make data list for stan
dataList <- list(
N = N, # total observations
N_obs = length(y_obs),
N_cens = sum(present) - length(y_obs),
present = present,
y_obs = y_obs
)
str(dataList)
The Stan code I’m using is below:
data {
int N;
int N_obs;
int N_cens;
array[N] int present;
array[N_obs] real<lower=0> y_obs;
}
parameters {
real mu;
real<lower=0> sigma;
real theta;
array[N_cens] real<lower=0> y_cens;
}
model {
// priors
theta ~ normal(0, 1);
mu ~ normal(0, 1);
sigma ~ normal(0, 1);
// likelihood
for(i in 1:N){
if(present[i] == 0){
target += log(inv_logit(theta));
}else{
target += log1m(inv_logit(theta));
y_cens ~ lognormal(mu, sigma);
y_obs ~ lognormal(mu, sigma);
}
}
}
The model does run, but the posteriors for mu and sigma are really tight and not anywhere near 0 and 1 (90%CI 0.223-0.229 and 0.692-0.696 on my latest run).
Any tips on how to properly code the model block? Without censored data, I wouldn’t use the data variable present
, but I couldn’t figure out a different way. Clearly mine isn’t working.