I’m relatively new to stan and have a question regarding posterior predictive draws. I fitted the following hierarchical linear regression model and used the generated quantities block for posterior predictive draws. However, when I plot the densities of y and yrep, they differ a lot. Since I used N(0,1) as a prior for my parameters, I’m assuming this falls into the informative prior category and the model should be quite good at predicting outcomes.
data {
int<lower=0> N;
int<lower=0> K;
int group[N];
vector[N] y;
vector[N] x;
}
parameters {
vector[K] mu;
vector[K] beta;
real<lower=0> sigma;
real nu_mu;
real nu_beta;
real<lower=0> tau_mu;
real<lower=0> tau_beta;
}
model {
nu_mu ~ normal(0,1);
nu_b ~ normal(0,1);
tau_mu ~ gamma(1,1);
tau_b ~ gamma(1,1);
sigma ~ gamma(1,1);
mu ~ normal(nu_mu, tau_mu);
beta ~ normal(nu_b, tau_b);
for(n in 1:N)
y[n] ~ normal(mu[group[n]] + beta[group[n]]*x[n], sigma);
}
//posterior predictive draws
generated quantities {
vector[K] beta_pred;
vector[K] mu_pred;
vector[N] y_rep;
for (j in 1:Ngroup){
mu_pred[j] = normal_rng(nu_mu, tau_mu);
}
for (j in 1:Ngroup){
beta_pred[j] = normal_rng(nu_b, tau_b);
}
for (n in 1:N){
y_rep[n] = normal_rng(mu_pred[group[n]] + beta_pred[group[n]]*x[n], sigma);
}
}
//posterior predictive density plots
y_pred1 <-as.matrix(fit.4, pars="y_rep")
ppc_dens_overlay(rep(y,20), y_pred1[1:100,])
Just to be clear, the dimension of y_pred1 is 4000, 2000. 2000 because K=20 and N=100