I have this simple regression model
y_{ij} = \mu_{i} + x_{ij}^{T}\beta + e_{ij},
where e_{ij} \sim N(0, \sigma^2_{i}) and e_{ij}, e_{ij'} are independent. i indexes ‘regions’: i=1, 2, \ldots, K and j indexes instances in each region: j=1, 2, \ldots, N_{i}.
Sometimes i have only a sample of y_{ij}, say j=1, 2, ..., n_{i} and n_{i} << N_{i} (but i still knowing all the data in X). When n_{i} > 0, i could learn all \mu_{i} and \sigma^2_{i} from the data, and then predict missing y_{ij} with
p(y_{ij}^{\star} | y ) = \int p(y_{ij}^{\star} | \theta) \, p(\theta | y) \, d\theta,
provided that my model is valid for new observations. Here \theta=( \sigma^{2}_{1:K},\, \mu_{1:K},\, \beta).
In the other hand, i read in the Stan User’s Guide (ch. 3) that i can predict the missing y_{ij} declaring this quantity as a parameter, so i coded:
data {
int<lower=1> K; // regions
int<lower=0> N_obs; // no. of observed y_{ij}
int<lower=0> N_mis; // no. of missing y_{ij}
array[N_obs] int<lower=1, upper=K> group; // region of observed
array[N_mis] int<lower=1, upper=K> groupmis; // region of missing
int<lower=1> p; // predictors
vector[N_obs] y_obs;
matrix[N_obs, p] X_obs;
matrix[N_mis, p] X_mis;
}
parameters {
vector[p] beta; // Regression coefficients
vector[K] mu; // intercept for each region
vector<lower=0>[K] sigma; // sd for each region
vector[N_mis] y_mis; // missing y_{ij}
}
model {
target +=normal_id_glm_lupdf(y_obs | X_obs, mu[group], beta, sigma[group]);
target +=normal_id_glm_lupdf(y_mis | X_mis, mu[groupmis], beta, sigma[groupmis]);
}
Here group and groupmis are vectors with elements (1, 1, ..., 1, 2, 2, ..., 2, ..., K, K, ..., K), etc. So mu[group] means that i select the mu entries such that n_{i} > 0 and mu[groupmis] select the mu entries such that n_{i} = 0, for all i.
Im not very sure if this code is correct:
-
Does
sigma[groupmis]andmu[groupmis]actually learn abouty_mis?, or the only information they get is bymu[group]andsigma[group]? -
When some n_{i}=0, say, if we were using a Gibbs sampler/Metropolis algorithm, we can generate the missing y_{ij} (
ymis), then updatemu[groupmis]andsigma[groupmis]and so on. I know that Stan is based on HMC/Nuts, so, is there any equivalent in Stan?, is the code above this equivalent?.
For sake of simplicity, i am using uniforms priors in sigma, mu and beta. Also, as initial values for ymis i put the OLS predictions. The following R code illustrate this
question-code.R (3.1 KB)
When n_{i}=0 for some region, the estimates for mu[i] and sigma[i] are bad, when n_{i}>0, the estimates are now much better.
Thanks in advance.