Fitting a latent variable model

charles19920528 · April 26, 2021, 3:45am

Hi, I have a censored data and I am trying to fit a hierarchical model with latent variable(Z). Here is the model
Z_{ij} \sim Bernoulli( P_z ).
\tilde{Y}_{ij} \sim Normal(\theta_i, \sigma).
Y_{ij} = M if Z_{ij} = 1 or \tilde{Y}_{ij} > M.
Y_{ij} = \tilde{Y}_{ij} if Z = 0 or \tilde{Y}_{ij} \leq M.
The model consider two types of censoring. Any observation has certain chance (P_z) to be censored. The other type of censoring is when the value gets too large.

When fitting the model I put a beta prior on P_z; a Gamma prior on \sigma and a normal prior on \theta_i's.

Here is the code where I used to generate the data.

i_size = 3
j_num = 500
sample_size = i_size * j_num
p_z = 0.15
M = 40 
sigma = 1

np.random.seed(1)
z_list = np.random.binomial(n=1, p=p_z, size=sample_size)
theta_list = np.random.uniform(10, 40, size=i_size)
y_ct_list = np.random.normal(loc=np.repeat(theta_list, j_num), scale=sigma, size=sample_size)
y_list = np.repeat(np.nan, sample_size)

observed_boolean_list = (z_list == 0) & (y_ct_list <= M)
y_list[observed_boolean_list] = y_ct_list[observed_boolean_list]
x_array = np.repeat(np.arange(i_size), j_num)
x_mat = np.zeros((x_array.size, x_array.max()+1))
x_mat[np.arange(x_array.size), x_array] = 1

When I tried to fit the model I described above I tried to fit the equivalent mixture model instead.

data {
  int<lower=0> N_cens;
  int<lower=0> N_obs;
  int<lower=1> i_size;
  real M;
  vector<lower=0>[N_obs] y_obs;
  matrix[N_obs, i_size] x_obs;
  matrix[N_cens, i_size] x_cens;
}
parameters {
  real<lower=0, upper=1> p_z_cens;
  real<lower=0> sigma;
  vector<lower=0>[i_size] theta;
}
model {
  sigma ~ gamma(0.2, 0.1);
  theta ~ normal(20, 7);
  p_z_cens ~ beta(2, 5);
  for (i in 1:N_obs) {
    target += log(1 - p_z_cens) + normal_lpdf(y_obs[i] | x_obs[i] * theta, sigma);
  }
  for (i in 1:N_cens) {
    target += log_mix(p_z_cens , 0, normal_lccdf(M | row(x_cens, i) * theta, sigma));
    //target += log_sum_exp(log(p_z_cens),log(1 - p_z_cens) + normal_lccdf(censored_ct | x_cens[i] * theta, sigma));
  }
}

To explain what happens in the model block, when the data is not censored the likelihood f(y_{ij}) is
(1 - P_z) * normal\_pdf(y_{ij}). When the data is censored the likelihood is P(Y_{ij} = M) = P_z * 1 + (1 - P_z) * (1 - Normal\_cdf(M)).

Here is how I code the model_data.

model_data = {"N_cens": int(sum(np.isnan(y_list))), "N_obs": int(sum(~np.isnan(y_list))), "i_size": i_size,
              "M": M, "y_obs": y_list[~np.isnan(y_list)],
              "x_obs": x_mat[~np.isnan(y_list), :],
              "x_cens": x_mat[np.isnan(y_list), :]}

There are several problems I encounter.

The warning message says “The parameter p_z_cens has 2 priors.” which I don’t see why it is the case.
The trace plots of \sigma and P_z don’t appear to be random when the number of replicates for each I(j_size) gets large. For example, one trace plot I got is

Screen Shot 2021-04-25 at 8.17.47 PM896×480 45 KB
The posterior distributions are not concentrating more and more around the true value as the sample size gets large. For \sigma, it’s completely off.

Screen Shot 2021-04-25 at 8.20.41 PM1778×1350 243 KB

If anyone can tell me what I did wrong (ie. I didn’t use the correct likelihood or error in Stan code), I really appreciate it. Thanks!

Ara_Winter · May 3, 2021, 3:07pm

Hi and welcome. Sorry for the long wait. This type of model is not my area but this post will bump your question back to the top.

charles19920528 · May 3, 2021, 3:55pm

Thanks. I was able to fit the model with pytwalk. However, I still have no clue how to fit with Stan.

Ara_Winter · May 3, 2021, 7:24pm

Could you post your model call as well? Unless I missed it.

charles19920528 · May 3, 2021, 7:31pm

I built a class as a wrapper to use the Stan package. Here is the essence of the class.

class StanModel:
    def __init__(self, model_code, model_data):
        self.model_code = model_code
        self.model_data = model_data
        self.posterior_dt = None

    def fit_posterior(self, random_seed=1, num_chains=1, num_samples=10000, **kwargs):
        posterior = stan.build(self.model_code, data=self.model_data, random_seed=random_seed)
        fit = posterior.sample(num_chains=num_chains, num_samples=num_samples, **kwargs)
        self.posterior_dt = fit.to_frame()

Here is how I called it.

bm_instance = bm.StanModel(model_code=model_code, model_data=model_data)
bm_instance.fit_posterior(random_seed=1, num_chains=1, num_samples=mcmc_trials)

Ara_Winter · May 3, 2021, 10:47pm

Just to start with things I know that are sticking out to me.

num_chains = 1. Unless you are debugging sme chain errors I would set this to 4.

I would also turn off the random_seed for right now.

How many cores are you running?

charles19920528 · May 3, 2021, 11:07pm

I didn’t set core number so it should the default number of the sample method. The reason I set num_chains = 1 is because I am not sure how to produce trace plots with multiple chains. As far as I know, I should plot trace from all chains separately. However right now, fit.to_frame() lumps MCMC samples from all chains into a single data frame and I cannot know which chain each sample belongs to. I am wondering if these MCMC samples in each column are ordered as [sampleS from chain 1, sampleS from chain 2 … sampleS from chain 4]. It’s not in the documentation (to the best of my knowledge) and there doesn’t seem to be a built-in method for trace plots.

Topic		Replies	Views
Initialization failed Modeling fitting-issues	3	2656	July 3, 2018
Modeling hierarchical binomial parameters with censored observations Modeling specification	4	519	February 23, 2021
Hierarchical bayesian regression PyStan techniques , fitting-issues , hierarchical-model	6	1043	May 22, 2020
Modelling dependent variable with some kind of rounding Modeling specification	7	788	September 6, 2018
Regression with censored explanatory variable Modeling	3	1068	May 10, 2020

Fitting a latent variable model

Related topics