Using two models to estimate the missing data in an independent variable


The response variable is y. The independent variable x has both observations x_{obs} and missing data x_{mis} . I am wondering if I can use two models that involve the same missing data x_{mis}. The first model is the distribution of the independent variable itself. The second model is a linear model y \sim x. Below is the example code.

data {
  int<lower=0> N_obs;
  int<lower=0> N_mis;
  vector[N_obs] y_obs1;
  vector[N_obs] x_obs;
  vector[N_mis] y_obs2;
parameters {
  real mu;
  real<lower=0> sigma1;
  vector[N_mis] x_mis;

  vector[2] b;
  real<lower=0> sigma2;
model {
  mu ~ N(0, 1); 
  sigma1 ~ N(0, 1) T[0, ];
  x_mis ~ N(0, 1);
  x_obs ~ N(mu, sigma1);
  x_mis ~ N(mu, sigma1); // first model for x_mis

  b ~ N(0, 1);
  sigma2 ~ N(0, 1) T[0, ];

  for(n in 1: N_obs) {
     y_obs1[n] ~ N(b[1]+b[2]*x_obs[n], sigma2);
  for(n in 1: N_mis) {
     y_obs2[n] ~ N(b[1]+b[2]*x_mis[n], sigma2); // second model for x_mis

The model can be fitted. But I don’t understand why Stan can use two models to estimate the same missing data. And I’m also wondering if the model is valid. Any advice?

This isn’t a well-formed Stan model. Stan uses normal for normal distributions.

I think in the above model you do not want x_mis ~ normal(0, 1). If the data distribution is normal(mu, sigma1), then you just want x_mis ~ normal(mu, sigma1).

By adding two priors, you get the product in Stan. So your version looks like this:

p(x_mis) =propto= normal(x_mis | 0, 1) * normal(x_mis | mu, sigma1).

So it’s well-formed in Stan, but just not what you want for missing data imputation.

Also, you don’t need the truncation on sigma2 sampling statement because the parameters are constant and it’ll just add a constant.

You can always simulate data where you know the answer and see how it works to then get rid of some of it and try to impute it.

Thanks Bob for your helpful answers. I’ll check the missing data issue further.

Now got a clear idea.

It’s extremely helpful that x_mis ~ normal(mu, sigma1) can be regarded as a prior. I tried one parameter with only its prior given, then the model was fitted and looked like generating data.

I agree that truncation is not necessary. I feel that the truncation distribution is only needed when the bounds can produce 0 log probability.