# Using two models to estimate the missing data in an independent variable

Hi,

The response variable is y. The independent variable x has both observations x_{obs} and missing data x_{mis} . I am wondering if I can use two models that involve the same missing data x_{mis}. The first model is the distribution of the independent variable itself. The second model is a linear model y \sim x. Below is the example code.

``````data {
int<lower=0> N_obs;
int<lower=0> N_mis;
vector[N_obs] y_obs1;
vector[N_obs] x_obs;
vector[N_mis] y_obs2;
}
parameters {
real mu;
real<lower=0> sigma1;
vector[N_mis] x_mis;

vector[2] b;
real<lower=0> sigma2;
}
model {
mu ~ N(0, 1);
sigma1 ~ N(0, 1) T[0, ];
x_mis ~ N(0, 1);
x_obs ~ N(mu, sigma1);
x_mis ~ N(mu, sigma1); // first model for x_mis

b ~ N(0, 1);
sigma2 ~ N(0, 1) T[0, ];

for(n in 1: N_obs) {
y_obs1[n] ~ N(b[1]+b[2]*x_obs[n], sigma2);
}

for(n in 1: N_mis) {
y_obs2[n] ~ N(b[1]+b[2]*x_mis[n], sigma2); // second model for x_mis
}
}
``````

The model can be fitted. But I don’t understand why Stan can use two models to estimate the same missing data. And I’m also wondering if the model is valid. Any advice?

This isn’t a well-formed Stan model. Stan uses `normal` for normal distributions.

I think in the above model you do not want `x_mis ~ normal(0, 1)`. If the data distribution is `normal(mu, sigma1)`, then you just want `x_mis ~ normal(mu, sigma1)`.

By adding two priors, you get the product in Stan. So your version looks like this:

``````p(x_mis) =propto= normal(x_mis | 0, 1) * normal(x_mis | mu, sigma1).
``````

So it’s well-formed in Stan, but just not what you want for missing data imputation.

Also, you don’t need the truncation on `sigma2` sampling statement because the parameters are constant and it’ll just add a constant.

You can always simulate data where you know the answer and see how it works to then get rid of some of it and try to impute it.

It’s extremely helpful that `x_mis ~ normal(mu, sigma1)` can be regarded as a prior. I tried one parameter with only its prior given, then the model was fitted and looked like generating data.