Hi, I have a censored data and I am trying to fit a hierarchical model with latent variable(Z). Here is the model
Z_{ij} \sim Bernoulli( P_z ).
\tilde{Y}_{ij} \sim Normal(\theta_i, \sigma).
Y_{ij} = M if Z_{ij} = 1 or \tilde{Y}_{ij} > M.
Y_{ij} = \tilde{Y}_{ij} if Z = 0 or \tilde{Y}_{ij} \leq M.
The model consider two types of censoring. Any observation has certain chance (P_z) to be censored. The other type of censoring is when the value gets too large.
When fitting the model I put a beta prior on P_z; a Gamma prior on \sigma and a normal prior on \theta_i's.
Here is the code where I used to generate the data.
i_size = 3
j_num = 500
sample_size = i_size * j_num
p_z = 0.15
M = 40
sigma = 1
np.random.seed(1)
z_list = np.random.binomial(n=1, p=p_z, size=sample_size)
theta_list = np.random.uniform(10, 40, size=i_size)
y_ct_list = np.random.normal(loc=np.repeat(theta_list, j_num), scale=sigma, size=sample_size)
y_list = np.repeat(np.nan, sample_size)
observed_boolean_list = (z_list == 0) & (y_ct_list <= M)
y_list[observed_boolean_list] = y_ct_list[observed_boolean_list]
x_array = np.repeat(np.arange(i_size), j_num)
x_mat = np.zeros((x_array.size, x_array.max()+1))
x_mat[np.arange(x_array.size), x_array] = 1
When I tried to fit the model I described above I tried to fit the equivalent mixture model instead.
data {
int<lower=0> N_cens;
int<lower=0> N_obs;
int<lower=1> i_size;
real M;
vector<lower=0>[N_obs] y_obs;
matrix[N_obs, i_size] x_obs;
matrix[N_cens, i_size] x_cens;
}
parameters {
real<lower=0, upper=1> p_z_cens;
real<lower=0> sigma;
vector<lower=0>[i_size] theta;
}
model {
sigma ~ gamma(0.2, 0.1);
theta ~ normal(20, 7);
p_z_cens ~ beta(2, 5);
for (i in 1:N_obs) {
target += log(1 - p_z_cens) + normal_lpdf(y_obs[i] | x_obs[i] * theta, sigma);
}
for (i in 1:N_cens) {
target += log_mix(p_z_cens , 0, normal_lccdf(M | row(x_cens, i) * theta, sigma));
//target += log_sum_exp(log(p_z_cens),log(1 - p_z_cens) + normal_lccdf(censored_ct | x_cens[i] * theta, sigma));
}
}
To explain what happens in the model block, when the data is not censored the likelihood f(y_{ij}) is
(1 - P_z) * normal\_pdf(y_{ij}). When the data is censored the likelihood is P(Y_{ij} = M) = P_z * 1 + (1 - P_z) * (1 - Normal\_cdf(M)).
Here is how I code the model_data.
model_data = {"N_cens": int(sum(np.isnan(y_list))), "N_obs": int(sum(~np.isnan(y_list))), "i_size": i_size,
"M": M, "y_obs": y_list[~np.isnan(y_list)],
"x_obs": x_mat[~np.isnan(y_list), :],
"x_cens": x_mat[np.isnan(y_list), :]}
There are several problems I encounter.
- The warning message says “The parameter p_z_cens has 2 priors.” which I don’t see why it is the case.
- The trace plots of \sigma and P_z don’t appear to be random when the number of replicates for each I(j_size) gets large. For example, one trace plot I got is
- The posterior distributions are not concentrating more and more around the true value as the sample size gets large. For \sigma, it’s completely off.
If anyone can tell me what I did wrong (ie. I didn’t use the correct likelihood or error in Stan code), I really appreciate it. Thanks!