Hello,

I am seeking assistance regarding the necessary steps after implementing posterior predictive checks. I am replicating Jeffrey Arnold’s polling aggregation model using my own dataset.

See Simon Jackman’s Bayesian Model Examples in Stan (jrnold.github.io))

The ‘ppc dens overlay’ analysis indicates that the simulated data points do not align well with the observed data.

Similarly, the ‘ppc stat_2d’ visualization illustrates that the simulated data points tend to cluster at values lower than the mean of the observed data, as well as higher than the standard deviation of the observed data.

My inquiry is: what measures can be undertaken to mitigate the disparities between the observed data and the simulated data points?

Here’s Stan code:

parameters {

vector[T] omega_raw;

real<lower = 0.> tau;

vector[H] eta_raw;

real<lower = 0.> zeta;

}

transformed parameters {

vector[N] mu;

vector[T] xi;

vector[H] eta;

eta = eta_raw * zeta;

xi[1] = xi_init_loc + omega_raw[1]*xi_init_scale;

for (t in 2:T) {

xi[t] = xi[t - 1] + omega_raw[t]*tau;
}
for (i in 1:N) {
mu[i] = xi[time[i]] + eta[house[i]];
}
}
model {
eta_raw ~ normal(3.5, 5.0); // eta_raw ~ normal(0., 1.);
zeta ~ normal(4.5, zeta_scale); // zeta ~ normal(0., zeta_scale);
tau ~ cauchy(0., 2.95*tau_scale); // tau ~ cauchy(0., tau_scale);

omega_raw ~ normal(5.0, 7.0); // omega_raw ~ normal(0., 1.);

y ~ normal(mu, s);

}

I’ve made an attempt to adjust the priors by increasing their values, aiming to illustrate the extent of this adjustment in comparison to Arnold’s original model. However, despite these adjustments, the posterior predictive checks show minimal change."

I would appreciate your input on the following question: What steps could be taken to alleviate the disparities between the observed data and the simulated data points? Any comments you could provide would be highly valued.