Hi,
The response variable is y. The independent variable x has both observations x_{obs} and missing data x_{mis} . I am wondering if I can use two models that involve the same missing data x_{mis}. The first model is the distribution of the independent variable itself. The second model is a linear model y \sim x. Below is the example code.
data {
int<lower=0> N_obs;
int<lower=0> N_mis;
vector[N_obs] y_obs1;
vector[N_obs] x_obs;
vector[N_mis] y_obs2;
}
parameters {
real mu;
real<lower=0> sigma1;
vector[N_mis] x_mis;
vector[2] b;
real<lower=0> sigma2;
}
model {
mu ~ N(0, 1);
sigma1 ~ N(0, 1) T[0, ];
x_mis ~ N(0, 1);
x_obs ~ N(mu, sigma1);
x_mis ~ N(mu, sigma1); // first model for x_mis
b ~ N(0, 1);
sigma2 ~ N(0, 1) T[0, ];
for(n in 1: N_obs) {
y_obs1[n] ~ N(b[1]+b[2]*x_obs[n], sigma2);
}
for(n in 1: N_mis) {
y_obs2[n] ~ N(b[1]+b[2]*x_mis[n], sigma2); // second model for x_mis
}
}
The model can be fitted. But I don’t understand why Stan can use two models to estimate the same missing data. And I’m also wondering if the model is valid. Any advice?