I am testing various parameterizations of a stanmodel on simulated data. All of those models are “true” in the sense that they reflect the data generation process. However, besides parameter recovery, I am interested in the predictive performance of those parameterizations.
As I can gather from BDA, LOO is an estimator for the expected log predictive density. However, since I know the true data generating process, I am able to generate new data and can calculate the predictive density on this new data.
The process in R looks like this representative code:
my_data <- my_data_generator(n)
my_model <- stan_model("my_model.stan")
# ...where my_model.stan has a generated quantities block with the log_lik
my_stanfit <- sampling(my_model, my_data)
my_draws <- as.matrix(my_stanfit)
# Generate new datasets
my_new_data <- replicate(n_rep, my_data_generator(n))
clppds <- sapply(
my_new_data,
function(dataset) {
this_gqs <- gqs(my_model, data = dataset, draws = my_draws)
this_log_lik <- as.matrix(this_gqs)
sum(log(colMeans(exp(this_log_lik))))
}
)
mean(clppds)
With the command sum(log(colMeans(exp(this_log_lik))))
I am calculating equation (7.5) from BDA on page 169:
\displaystyle\text{computed lppd} = \sum_{i=1}^n \log\left(\frac{1}{S}\sum_{s=1}^S p(y_i | \theta^s)\right).
And then I average over those clppds to get an estimate for
\displaystyle E_f(\log p_{\text{post}}(\tilde{y}))=\int\log p_{\text{post}}(\tilde{y})f(\tilde{y})d\tilde{y}=\int\log\left[\int p(\tilde{y} | \theta)p(\theta | y)d\theta\right]f(\tilde{y})d\tilde{y},
where f is the true data generating process, y is my original data and \tilde{y} is my new data.
My question now obviously is:
- Does that make sense?
- How can I compute a SE of that estimate, i.e. which distribution does it follow or how can I work this out?
- If I do this for multiple models, how do I compute the difference (naive, i.e. clppd_1 - clppd_2 or pointwise?) and the SE of this difference?
I did this for some models and compared it with LOO - the differences are quite big (more than 10*SE), so either they don’t estimate the same thing or something is wrong (at least both somewhat agree on the ordering).
model my_thing elpd_loo se_elpd_loo
bp_rep1 -7140.384 -6545.76 67.39
bs_rep1 -7221.498 -6537.59 67.34
bstz_rep1 -7253.259 -6545.77 67.46
bstz_dt_ll2 -7512.211 -7817.66 68.46
bp_dt_ll2 -7515.083 -7832.83 68.90
bs_dt_ll2 -7515.736 -7837.60 68.53
I would be really happy if someone could give an insight.
Thank you