Hello,
I am running a multilevel model for scores of individuals over time. In the generated quantities block I have two methods that come to my mind when trying to “predict/ replicate” the data, called y_pred and y_pred1.
data{
int<lower=0> Ni; // number of observations
int<lower=0> Nj; // number of individuals
int<lower=0> cluster[Ni]; // individual ID
vector[Ni] Y_ij; // Outcome score
int<lower=0> p; // Number of auxilliary(independent) variables and intercept
matrix[Ni,p] X_ij; // Design matrix of independent variables
vector[Ni] X1_ij; // years before death vector
int<lower=0> N_new; // number of posterior predictions
}
parameters{
real beta_0; // population intercept
real u_0j[Nj]; // random effects for each individual
real beta_1;
real u_1j[Nj]; // differential effect of age for each individual
real<lower=0> sigma_u0; // standard deviation of random effects of intercepts
real<lower=0> sigma_u1; // standard deviation of random effects of slopes
vector[p] beta; // fixed effects for independent variables
real<lower=0> sigma_v0; // standard deviation for random error
}
transformed parameters{
vector[Nj] beta_0j; // varying intercepts by individuals
vector[Nj] beta_1j; // varying slope for Years.bl by individuals
vector[Ni] mu_ij;
for(j in 1:Nj)
beta_0j[j] = beta_0 + u_0j[j];
for(j in 1:Nj)
beta_1j[j] = beta_1 + u_1j[j];
mu_ij = beta_0j[cluster] + beta_1j[cluster].*X1_ij + X_ij* beta; // random intercept and slopes for individuals
}
model{
//Priors and hyper-priors
u_0j ~ normal(0, sigma_u0);
u_1j ~ normal(0, sigma_u1);
sigma_u0 ~ normal(0, 2.5);
sigma_u1 ~ normal(0, 2.5);
sigma_v0 ~ normal(0, 2.5);
beta_0 ~ normal(0,100);
beta_1 ~ normal(0,100);
beta ~ normal(0,100);
//Likelihood
Y_ij ~ normal(mu_ij, sigma_v0);
}
generated quantities {
vector[Ni] y_pred; // predcitions for observed values
vector[Ni] y_pred1; // ALTERNATE predcitions for observed values
for(n in 1:Ni)
y_pred[n] = normal_rng(mu_ij[n], sigma_v0);
for(n in 1:Ni)
y_pred1[n] = normal_rng(beta_0 + normal_rng(0, sigma_u0) + (beta_1 + normal_rng(0, sigma_u1)).*X1_ij[n] + X_ij[n]* beta, sigma_v0);
}
I understand that y_pred and y_pred1 are different, but which out of these two is the correct way of trying to predict the given data as part of post predictive checks. Or if both are incorrect?!
Thank you!