To my knowledge, using loo with brm_multiple objects calculates the elpd on the first dataset only.
Is there any consensus or ideas how LOO with multiple imputated datasets would look like?
I think ideally one would like a solution that evaluates the combined posterior of the m imputed datasets. So maybe something like:
- For each of the m imputed datasets, use standard PSIS-LOO to generate samples from the LOO posterior for leaving out data point y_{i}.
- Combine the LOO posterior draws
- Use the combined posterior to evaluate the lpd for each y_{i} of the imputed datasets.
Does this even make sense? I feel like I’m missing something obvious here.
Are you imputing just covariates or also target variable y?
Assuming you are imouting just teh covariates, and you are using R and loo
package
- For each of the m imputed datasets compute loo as usual
- each loo object has pointwise log predictive densities
loo_object$pointwise[,'elpd_loo']
- compute means of pointwise predictive densities (using
exp(loo_object$pointwise[,'elpd_loo']
) where means are over the m imputed datasets (the result has n pointwise predictive densities)
-
elpd_loo
is then sum of log of pointwise predictions (from step 3.)
3 Likes
Thank you!
Yes I’m mostly thinking about the case of just imputing the covariates.
So it is enough to calculate elpd on the individual datasets and then average? I felt a bit reluctant to do this because the posterior that is actually used later on is the combined posterior and here we are evaluating the individual posteriors instead.
No. Calculate pointwise predictive densities on the individual datasets, average over datasets, take a logarithm and then sum over log pointwise predictive densities to get elpd.
No. The averaging the pointwise predictive densities over the datasets makes the result use the combined posterior.
You just have to be careful what you average here.
1 Like
Is there any proper way to calculate the average of other loo statistics also? Like SE, p_loo, and k?
1 Like