Predicting the outcome values for multilevel models via generated quantities block

I am running a multilevel model for scores of individuals over time. In the generated quantities block I have two methods that come to my mind when trying to “predict/ replicate” the data, called y_pred and y_pred1.

  int<lower=0> Ni; // number of observations
  int<lower=0> Nj; // number of individuals
  int<lower=0> cluster[Ni]; // individual ID
  vector[Ni] Y_ij; // Outcome score
  int<lower=0> p; // Number of auxilliary(independent) variables and intercept
  matrix[Ni,p] X_ij; // Design matrix of independent variables
  vector[Ni] X1_ij; // years before death vector
  int<lower=0> N_new; // number of posterior predictions

  real beta_0; // population intercept
  real u_0j[Nj]; // random effects for each individual
  real beta_1;
  real u_1j[Nj]; // differential effect of age for each individual 
  real<lower=0> sigma_u0; // standard deviation of random effects of intercepts
  real<lower=0> sigma_u1; // standard deviation of random effects of slopes
  vector[p] beta; // fixed effects for independent variables
  real<lower=0> sigma_v0; // standard deviation for random error

transformed parameters{
  vector[Nj] beta_0j; // varying intercepts by individuals
  vector[Nj] beta_1j; // varying slope for by individuals
  vector[Ni] mu_ij;
  for(j in 1:Nj)
    beta_0j[j] = beta_0 + u_0j[j];
  for(j in 1:Nj)
    beta_1j[j] = beta_1 + u_1j[j];
    mu_ij = beta_0j[cluster] + beta_1j[cluster].*X1_ij + X_ij* beta; // random intercept and slopes for individuals

  //Priors and hyper-priors
  u_0j ~ normal(0, sigma_u0);
  u_1j ~ normal(0, sigma_u1);
  sigma_u0 ~ normal(0, 2.5);
  sigma_u1 ~ normal(0, 2.5);
  sigma_v0 ~ normal(0, 2.5);
  beta_0 ~ normal(0,100);
  beta_1 ~ normal(0,100);
  beta ~ normal(0,100);
  Y_ij ~ normal(mu_ij, sigma_v0);

generated quantities {
  vector[Ni] y_pred; // predcitions for observed values
  vector[Ni] y_pred1; // ALTERNATE predcitions for observed values
  for(n in 1:Ni)
  y_pred[n] = normal_rng(mu_ij[n], sigma_v0);

  for(n in 1:Ni)
  y_pred1[n] = normal_rng(beta_0 + normal_rng(0, sigma_u0) + (beta_1 + normal_rng(0, sigma_u1)).*X1_ij[n] + X_ij[n]* beta, sigma_v0);


I understand that y_pred and y_pred1 are different, but which out of these two is the correct way of trying to predict the given data as part of post predictive checks. Or if both are incorrect?!

Thank you!

1 Like

y_pred is correct

1 Like

Excellent. Thank you!