I a little bit confused about missing data imputation in Stan, for a simple example Y and X have the following relation:
data{
vector[N] y;
}
model {
vector[N] mu = alpha + beta * x;
y ~ normal(mu, sigma);
alpha ~ normal(0,100);
beta ~ normal(0,100);
}
Suppose now we have missing data issue, where we observe all Ys (eg. a vector of length n) but only some of the Xs (eg, a vector of length 2n/3) and we are interested in imputing the missing values of X. I am wondering in this case should I put X into ‘data’ or ‘parameter’ section?
Also I am confused about the general ideas of using Bayesian methods for missing data imputation. In my understanding we should treat the missing data as ‘parameters’ in Bayesian setting. However, in the above situation we also observe 2n/3 of the data, so if we purely put all X as ‘paramter’ it seems not making sense. Should we treat the observed 2n/3 of X as observed data and missing n/3 of X as parameters?
Thx!